MongoDB — alert when oplog window shrinks
Writes on primary grow faster than oplog retention. If a secondary falls behind by more than the oplog window, you need an initial sync (hours of downtime). Usually noticed too late.
Recipe
#!/usr/bin/env bash
# /etc/cron.d/mongo-oplog
# */15 * * * * mongo /opt/mongo-oplog.sh
URI=${MONGO_URI:-mongodb://localhost:27017}
WARN_HOURS=${WARN_HOURS:-12} # alert when window < 12 h
WINDOW_H=$(mongosh --quiet "$URI" --eval '
const r = db.getSiblingDB("local").oplog.rs.stats();
const cap = r.maxSize;
const used = r.size;
const rate = used / (db.getSiblingDB("local").oplog.rs.find().sort({ts:-1}).limit(1).next().ts.t -
db.getSiblingDB("local").oplog.rs.find().sort({ts:1 }).limit(1).next().ts.t);
print(Math.floor(cap / rate / 3600));
')
if [ "${WINDOW_H:-0}" -lt "$WARN_HOURS" ]; then
curl -fsS "$HEARTBEAT_URL" --data "oplog_window_h=$WINDOW_H,threshold=$WARN_HOURS"
exit 2
fi
echo "OK (window=${WINDOW_H}h)"
Same thing in Enterno.io
Wrap in an Enterno heartbeat — track the window-shrink trend over a week and grow the oplog before an initial-sync incident.
Related recipes
A replica-set secondary falls behind the primary; the app will read stale data within a minute. Want an HTTP endpoint that says "ok" or "lag".
Detect the moment a replica falls behind the primary by more than 10 seconds.
Redis slave is behind master — read-after-write returns stale data. No native alert, you need an external one.