MySQL replica — alert when SQL thread is stuck
A MySQL replica stops applying binlog (Slave_SQL_Running=No after a bad query) — your heartbeat-only monitor doesn't catch it because lag isn't growing linearly, it's just frozen.
Recipe
#!/usr/bin/env bash
# /etc/cron.d/mysql-repl
# */1 * * * * root /opt/mysql-repl-check.sh
USER=${MYSQL_USER:-monitor}
PWD=${MYSQL_PWD:-changeme}
HOST=${MYSQL_HOST:-127.0.0.1}
# 8.0.22+ uses REPLICA, older uses SLAVE — try both
OUT=$(mysql -u"$USER" -p"$PWD" -h"$HOST" -e 'SHOW REPLICA STATUS\G' 2>&1)
[ -z "$OUT" ] && OUT=$(mysql -u"$USER" -p"$PWD" -h"$HOST" -e 'SHOW SLAVE STATUS\G' 2>&1)
IO_OK=$(echo "$OUT" | awk -F: '/Slave_IO_Running:|Replica_IO_Running:/ {gsub(/ /,"",$2); print $2}')
SQL_OK=$(echo "$OUT" | awk -F: '/Slave_SQL_Running:|Replica_SQL_Running:/{gsub(/ /,"",$2); print $2}')
LAG=$(echo "$OUT" | awk -F: '/Seconds_Behind_Master:|Seconds_Behind_Source:/{gsub(/ /,"",$2); print $2}')
ERR=$(echo "$OUT" | awk -F: '/Last_SQL_Error:|Last_Error:/{sub(/^[^:]*:/, ""); print; exit}')
if [ "$IO_OK" != "Yes" ] || [ "$SQL_OK" != "Yes" ]; then
curl -fsS "$HEARTBEAT_URL" --data-urlencode "err=IO=$IO_OK,SQL=$SQL_OK,lag=$LAG,reason=$ERR"
exit 2
fi
echo "OK (lag=${LAG}s)"
Same thing in Enterno.io
Replace the cron+curl pair with an Enterno heartbeat — replication-failure retention (when did it last hang) plus alerts to Telegram+email+Slack without bringing up Alertmanager.
Related recipes
long_query_time = 1, slow_query_log enabled. You need to know when the slow-query rate per minute suddenly jumps (a deploy broke an index, ORM went N+1).
A replica-set secondary falls behind the primary; the app will read stale data within a minute. Want an HTTP endpoint that says "ok" or "lag".
Detect the moment a replica falls behind the primary by more than 10 seconds.