Skip to content

Kafka — alert when a consumer offset stops advancing

The consumer is up but offset isn't growing (consumer-thread deadlock, or it's stuck without heartbeat). A lag-only check shows 0 lag because the producer is also idle — but the bug is in production.

Recipe

bash
#!/usr/bin/env bash
# /etc/cron.d/kafka-offset
# */2 * * * * root /opt/kafka-offset.sh

GROUP=${CONSUMER_GROUP}
TOPIC=${TOPIC}
BROKER=${BROKER:-kafka:9092}
STATE=/var/lib/kafka-offset.state.$GROUP-$TOPIC
STALL_CYCLES=${STALL_CYCLES:-5}       # alert after N cycles unchanged

# Get current end offset across all partitions
NOW=$(kafka-consumer-groups.sh --bootstrap-server "$BROKER" --group "$GROUP" --describe \
  | awk -v t="$TOPIC" '$2==t {sum += $4} END {print sum+0}')

PREV=$(cat "$STATE" 2>/dev/null || echo "$NOW:0")
PREV_OFFSET=${PREV%:*}
PREV_STALL=${PREV#*:}

if [ "$NOW" = "$PREV_OFFSET" ]; then
  STALL=$((PREV_STALL + 1))
else
  STALL=0
fi
echo "$NOW:$STALL" > "$STATE"

if [ "$STALL" -ge "$STALL_CYCLES" ]; then
  curl -fsS "$HEARTBEAT_URL" --data "stall=$STALL,offset=$NOW,group=$GROUP"
  exit 2
fi
echo "OK (offset=$NOW stall=$STALL)"

Same thing in Enterno.io

Wrap in an Enterno heartbeat — catches "consumer is frozen" specifically (a different signal from lag-rate), and history shows whether it stalled for 30 s or for the second time this week.

Set up HTTP monitor → ← All recipes

Related recipes