Skip to content

MySQL replica — alert when SQL thread is stuck

A MySQL replica stops applying binlog (Slave_SQL_Running=No after a bad query) — your heartbeat-only monitor doesn't catch it because lag isn't growing linearly, it's just frozen.

Recipe

bash
#!/usr/bin/env bash
# /etc/cron.d/mysql-repl
# */1 * * * * root /opt/mysql-repl-check.sh

USER=${MYSQL_USER:-monitor}
PWD=${MYSQL_PWD:-changeme}
HOST=${MYSQL_HOST:-127.0.0.1}

# 8.0.22+ uses REPLICA, older uses SLAVE — try both
OUT=$(mysql -u"$USER" -p"$PWD" -h"$HOST" -e 'SHOW REPLICA STATUS\G' 2>&1)
[ -z "$OUT" ] && OUT=$(mysql -u"$USER" -p"$PWD" -h"$HOST" -e 'SHOW SLAVE STATUS\G' 2>&1)

IO_OK=$(echo "$OUT"  | awk -F: '/Slave_IO_Running:|Replica_IO_Running:/  {gsub(/ /,"",$2); print $2}')
SQL_OK=$(echo "$OUT" | awk -F: '/Slave_SQL_Running:|Replica_SQL_Running:/{gsub(/ /,"",$2); print $2}')
LAG=$(echo "$OUT"    | awk -F: '/Seconds_Behind_Master:|Seconds_Behind_Source:/{gsub(/ /,"",$2); print $2}')
ERR=$(echo "$OUT"    | awk -F: '/Last_SQL_Error:|Last_Error:/{sub(/^[^:]*:/, ""); print; exit}')

if [ "$IO_OK" != "Yes" ] || [ "$SQL_OK" != "Yes" ]; then
  curl -fsS "$HEARTBEAT_URL" --data-urlencode "err=IO=$IO_OK,SQL=$SQL_OK,lag=$LAG,reason=$ERR"
  exit 2
fi
echo "OK (lag=${LAG}s)"

Same thing in Enterno.io

Replace the cron+curl pair with an Enterno heartbeat — replication-failure retention (when did it last hang) plus alerts to Telegram+email+Slack without bringing up Alertmanager.

Set up HTTP monitor → ← All recipes

Related recipes

long_query_time = 1, slow_query_log enabled. You need to know when the slow-query rate per minute suddenly jumps (a deploy broke an index, ORM went N+1).