PostgreSQL — alert when a table is bloating
A table takes 200 GB, of which 150 GB is bloat (dead tuples). VACUUM FULL needs an exclusive lock, autovacuum cannot keep up. You notice when an index scan turns into a seq scan.
Recipe
#!/usr/bin/env bash
# /etc/cron.d/pg-bloat
# 0 3 * * * postgres /opt/pg-bloat.sh
DB=${PGDATABASE:-postgres}
THRESH_PCT=${THRESH_PCT:-30} # alert above 30 % bloat
BLOATED=$(psql -At -d "$DB" -c "
SELECT relname || ':' || ROUND(100.0 * n_dead_tup / GREATEST(n_live_tup, 1)) AS bloat_pct
FROM pg_stat_user_tables
WHERE n_live_tup > 100000
AND 100.0 * n_dead_tup / GREATEST(n_live_tup, 1) > $THRESH_PCT
ORDER BY n_dead_tup DESC
LIMIT 5;
")
if [ -n "$BLOATED" ]; then
COUNT=$(echo "$BLOATED" | wc -l)
EXAMPLES=$(echo "$BLOATED" | tr '\n' ',')
curl -fsS "$HEARTBEAT_URL" --data-urlencode "tables_bloated=$COUNT,examples=$EXAMPLES"
exit 2
fi
echo "OK (no tables > ${THRESH_PCT}% bloat)"
Same thing in Enterno.io
Wrap in an Enterno heartbeat — watch the trend "bloat climbs 1 %/day" and schedule pg_repack a week before OOM.
Related recipes
autovacuum_max_workers are pinned (long-running query holds a lock, or vacuum_cost_limit is too low) — tables bloat, disk usage climbs linearly. Postgres itself does not alert.
Detect the moment a replica falls behind the primary by more than 10 seconds.
Logs or backup files eat /var; in 24 hours the server falls over. A basic df check every 10 minutes saves a 2 AM incident.