MongoDB replica set — alert on replication lag
A replica-set secondary falls behind the primary; the app will read stale data within a minute. Want an HTTP endpoint that says "ok" or "lag".
Recipe
#!/usr/bin/env bash
# Wrap as HTTP via fastcgi or python -m http.server.
THRESHOLD_S="${LAG_THRESHOLD_S:-10}"
LAG=$(mongo --quiet --eval '
var s = rs.printSecondaryReplicationInfo();
var st = rs.status();
var lag = 0;
st.members.forEach(function (m) {
if (m.stateStr === "SECONDARY" && m.optimeDate) {
var d = (st.date - m.optimeDate) / 1000;
if (d > lag) lag = d;
}
});
print(Math.round(lag));
' 2>/dev/null)
[ -z "$LAG" ] && { echo "no-data"; exit 1; }
[ "$LAG" -ge "$THRESHOLD_S" ] && echo "lag $LAG" || echo "ok $LAG"
Same thing in Enterno.io
Endpoint + Enterno HTTP monitor with keyword-rule "does not contain ok" alerts the moment a secondary falls behind. Pioneer+ persists response-time which correlates with heavy writes.
Related recipes
A MySQL replica stops applying binlog (Slave_SQL_Running=No after a bad query) — your heartbeat-only monitor doesn't catch it because lag isn't growing linearly, it's just frozen.
Writes on primary grow faster than oplog retention. If a secondary falls behind by more than the oplog window, you need an initial sync (hours of downtime). Usually noticed too late.
Detect the moment a replica falls behind the primary by more than 10 seconds.