MongoDB — alert when oplog window shrinks · monitoring cookbook · Enterno.io

Anatoly Oshmanovsky

MongoDB — alert when oplog window shrinks

Writes on primary grow faster than oplog retention. If a secondary falls behind by more than the oplog window, you need an initial sync (hours of downtime). Usually noticed too late.

Stack: mongo-shell · cron Tags: mongodb, oplog, replication

Recipe

bash

#!/usr/bin/env bash
# /etc/cron.d/mongo-oplog
# */15 * * * * mongo /opt/mongo-oplog.sh

URI=${MONGO_URI:-mongodb://localhost:27017}
WARN_HOURS=${WARN_HOURS:-12}          # alert when window < 12 h

WINDOW_H=$(mongosh --quiet "$URI" --eval '
  const r = db.getSiblingDB("local").oplog.rs.stats();
  const cap = r.maxSize;
  const used = r.size;
  const rate = used / (db.getSiblingDB("local").oplog.rs.find().sort({ts:-1}).limit(1).next().ts.t -
                       db.getSiblingDB("local").oplog.rs.find().sort({ts:1 }).limit(1).next().ts.t);
  print(Math.floor(cap / rate / 3600));
')

if [ "${WINDOW_H:-0}" -lt "$WARN_HOURS" ]; then
  curl -fsS "$HEARTBEAT_URL" --data "oplog_window_h=$WINDOW_H,threshold=$WARN_HOURS"
  exit 2
fi
echo "OK (window=${WINDOW_H}h)"