CloudFront — alert on rising 5xx error rate
A CloudFront distribution started serving 5xx 4 % of the time — far-region clients see broken pages. CloudWatch graph exists; dashboard goes unwatched.
Recipe
#!/usr/bin/env bash
# 5xx-error-rate over the last 5 minutes.
DIST="${CF_DIST:?must be set}"
THRESHOLD_PCT="${THRESHOLD_PCT:-2}"
NOW=$(date -u -d '5 min ago' +%FT%TZ)
END=$(date -u +%FT%TZ)
PCT=$(aws cloudwatch get-metric-statistics \
--namespace AWS/CloudFront --metric-name 5xxErrorRate \
--dimensions Name=DistributionId,Value="$DIST" Name=Region,Value=Global \
--start-time "$NOW" --end-time "$END" \
--period 60 --statistics Average \
--query 'Datapoints[-1].Average' --output text 2>/dev/null)
[ -z "$PCT" ] || [ "$PCT" = "None" ] && { echo "no-data"; exit 0; }
PCT_INT=$(printf "%.0f" "$PCT")
[ "$PCT_INT" -ge "$THRESHOLD_PCT" ] && echo "high ${PCT_INT}%" || echo "ok ${PCT_INT}%"
Same thing in Enterno.io
Endpoint + Enterno HTTP monitor with "ok" keyword every minute. Pair with a health monitor on a public URL behind the CDN — edge-error vs origin-stack correlation pinpoints root cause in minutes.
Related recipes
The CDN cache_status header (cf-cache-status or x-cache) suddenly returns MISS on more than 30% of requests — origin load + bandwidth bills both spike.
A release bumped the bundle size and p99 cold-start went from 800ms to 3s. The metric is in CloudWatch, but nobody’s watching. Want a heartbeat-style alert.
Ensure your site returns 2xx every minute, alert to Slack/Telegram on failure.