Alertmanager — alert when an alert is stuck in pending
An alertmanager alert sits in state=pending past its for-window — it should be active but is not firing (group_wait too big? notifier broken? misconfigured route?). Nobody gets paged.
Recipe
#!/usr/bin/env bash
# /etc/cron.d/am-stuck
# */5 * * * * root /opt/am-stuck.sh
AM=${AM_URL:-http://localhost:9093}
THRESH_MIN=${THRESH_MIN:-15}
NOW=$(date -u +%s)
STUCK=$(curl -fsS "$AM/api/v2/alerts" \
| jq --argjson now "$NOW" --argjson max "$THRESH_MIN" '
[.[] | select(.status.state == "pending") |
{name: .labels.alertname,
age_min: (($now - (.startsAt | sub("\\.[0-9]+Z$"; "Z") | fromdateiso8601)) / 60)} |
select(.age_min > $max)]')
COUNT=$(echo "$STUCK" | jq 'length')
if [ "${COUNT:-0}" -gt 0 ]; then
EXAMPLES=$(echo "$STUCK" | jq -r '.[] | "\(.name)=\(.age_min|floor)m"' | head -3 | tr '\n' ',')
curl -fsS "$HEARTBEAT_URL" --data "pending=$COUNT,examples=$EXAMPLES"
exit 2
fi
echo "OK (no stuck pending alerts)"
Same thing in Enterno.io
Wrap in an Enterno heartbeat — a meta-monitor for alertmanager that catches "alerts firing but not delivering" before on-call notices.
Related recipes
Prometheus + Alertmanager only ship alerts to email or PagerDuty. The team lives in Telegram and you need a bridge without spinning up another service.
Prometheus itself is alive, but one of its targets has up==0 — data stops flowing, graphs go blank, and alertmanager rules built on that target don't fire (no data = no alert).
Filebeat / Logstash silently died on one edge node. Elasticsearch ingest rate fell 40 % but no one watches dashboards. Sentry without logs is blindness.