Kafka consumer lag — alert when behind by more than N
Consumer group lags behind the producer and messages pile up. Need a lag threshold that triggers an alert.
Recipe
bash
#!/usr/bin/env bash
# /opt/kafka-lag-watch.sh — exposes the worst lag of a group as plain text
# Wrap as HTTP endpoint via nginx fastcgi or python -m http.server.
GROUP="${1:-payments-consumer}"
BROKER="${2:-localhost:9092}"
THRESHOLD="${LAG_THRESHOLD:-10000}"
LAG=$(/opt/kafka/bin/kafka-consumer-groups.sh \
--bootstrap-server "$BROKER" --describe --group "$GROUP" 2>/dev/null \
| awk 'NR>2 && $5 != "-" {print $5}' | sort -n | tail -1)
[ -z "$LAG" ] && { echo "no-data"; exit 1; }
[ "$LAG" -ge "$THRESHOLD" ] && echo "lag $LAG" || echo "ok $LAG"
Same thing in Enterno.io
Expose /kafka-lag → "ok N" / "lag N" and point an Enterno HTTP monitor with keyword-rule "does not contain ok". Instant alert, no alertmanager needed.
Related recipes
Ensure your site returns 2xx every minute, alert to Slack/Telegram on failure.
Minimal script that checks an SSL certificate and alerts 14 days before expiry.
Detect the moment a replica falls behind the primary by more than 10 seconds.