Kafka — alert when a consumer offset stops advancing
The consumer is up but offset isn't growing (consumer-thread deadlock, or it's stuck without heartbeat). A lag-only check shows 0 lag because the producer is also idle — but the bug is in production.
Recipe
#!/usr/bin/env bash
# /etc/cron.d/kafka-offset
# */2 * * * * root /opt/kafka-offset.sh
GROUP=${CONSUMER_GROUP}
TOPIC=${TOPIC}
BROKER=${BROKER:-kafka:9092}
STATE=/var/lib/kafka-offset.state.$GROUP-$TOPIC
STALL_CYCLES=${STALL_CYCLES:-5} # alert after N cycles unchanged
# Get current end offset across all partitions
NOW=$(kafka-consumer-groups.sh --bootstrap-server "$BROKER" --group "$GROUP" --describe \
| awk -v t="$TOPIC" '$2==t {sum += $4} END {print sum+0}')
PREV=$(cat "$STATE" 2>/dev/null || echo "$NOW:0")
PREV_OFFSET=${PREV%:*}
PREV_STALL=${PREV#*:}
if [ "$NOW" = "$PREV_OFFSET" ]; then
STALL=$((PREV_STALL + 1))
else
STALL=0
fi
echo "$NOW:$STALL" > "$STATE"
if [ "$STALL" -ge "$STALL_CYCLES" ]; then
curl -fsS "$HEARTBEAT_URL" --data "stall=$STALL,offset=$NOW,group=$GROUP"
exit 2
fi
echo "OK (offset=$NOW stall=$STALL)"
Same thing in Enterno.io
Wrap in an Enterno heartbeat — catches "consumer is frozen" specifically (a different signal from lag-rate), and history shows whether it stalled for 30 s or for the second time this week.
Related recipes
Consumer group lags behind the producer and messages pile up. Need a lag threshold that triggers an alert.
A Redis Streams consumer lags — it reads messages but never XACKs them (the worker hangs between read and ack). XLEN does not grow, XPENDING does.
Ensure your site returns 2xx every minute, alert to Slack/Telegram on failure.