Cassandra — alert when repair has not run within gc_grace
Cassandra needs a full repair within `gc_grace_seconds` (default 10 days) — otherwise deletes resurrect as zombies on failover. Easy to miss without a scheduler.
Recipe
bash
#!/usr/bin/env bash
# /etc/cron.d/cassandra-repair
# 0 6 * * * cassandra /opt/cassandra-repair-check.sh
KEYSPACE=${KEYSPACE}
GC_DAYS=${GC_DAYS:-10} # alert if last repair older than 10d
# Path differs per distribution; adapt as needed
LAST=$(grep -h 'Repair completed' /var/log/cassandra/system.log* 2>/dev/null \
| tail -1 | awk '{print $1, $2}')
[ -z "$LAST" ] && {
curl -fsS "$HEARTBEAT_URL" --data "no_repair_log=true,keyspace=$KEYSPACE"
exit 2
}
EPOCH=$(date -d "$LAST" +%s 2>/dev/null || echo 0)
NOW=$(date +%s)
DAYS=$(( (NOW - EPOCH) / 86400 ))
if [ "$DAYS" -gt "$GC_DAYS" ]; then
curl -fsS "$HEARTBEAT_URL" --data "last_repair_days=$DAYS,gc_days=$GC_DAYS"
exit 2
fi
echo "OK (last repair $DAYS d ago)"
Same thing in Enterno.io
Wire to an Enterno heartbeat with 30-day retention — you see the repair cron stopped days before any zombie-row incident.
Related recipes
Ensure your site returns 2xx every minute, alert to Slack/Telegram on failure.
Minimal script that checks an SSL certificate and alerts 14 days before expiry.
Detect the moment a replica falls behind the primary by more than 10 seconds.