Apache — alert on 5xx ratio over the last minute
Apache starts returning 502/503 from one backend but not all. Want an endpoint with the 5xx ratio over the last 60 s.
Recipe
bash
#!/usr/bin/env bash
# Computes 5xx ratio of last 60s of access log. Wrap as HTTP endpoint.
LOG="${LOG:-/var/log/apache2/access.log}"
THRESHOLD_PCT="${RATIO_THRESHOLD:-2}"
# Last 60 seconds of log entries
LINES=$(awk -v cutoff="$(date -d '60 seconds ago' '+%d/%b/%Y:%H:%M:%S')" '
match($4, /\[([^]]+)\]/, m) { if (m[1] >= cutoff) print $9 }
' "$LOG")
TOTAL=$(echo "$LINES" | wc -l)
[ "$TOTAL" -lt 5 ] && { echo "ok 0"; exit 0; } # too few requests
ERR=$(echo "$LINES" | awk '/^5/ {n++} END{print n+0}')
PCT=$(( ERR * 100 / TOTAL ))
[ "$PCT" -ge "$THRESHOLD_PCT" ] && echo "high $PCT% ($ERR/$TOTAL)" || echo "ok $PCT%"
Same thing in Enterno.io
Endpoint + Enterno HTTP monitor with "starts with ok" rule catches 5xx surges. Pair with Security Scanner to see whether CSP/HSTS-enabled endpoints are the affected ones.
Related recipes
Ensure your site returns 2xx every minute, alert to Slack/Telegram on failure.
Minimal script that checks an SSL certificate and alerts 14 days before expiry.
Detect the moment a replica falls behind the primary by more than 10 seconds.