Skip to content

nginx — alert on a proxy_cache hit-ratio drop

nginx proxy_cache hit ratio drops — backend starts burning CPU. Usually "forgot proxy_cache_valid in a new location", or cache was wiped, or TTL is too short.

Recipe

bash
#!/usr/bin/env bash
# /etc/cron.d/nginx-cache
# */1 * * * * root /opt/nginx-cache.sh

LOG=${LOG:-/var/log/nginx/access.log}
WINDOW=60                             # last 60 s
MIN_HITS_PCT=${MIN_HITS_PCT:-50}      # alert when HIT < 50 % of total
SINCE=$(date -d "-${WINDOW} seconds" '+%d/%b/%Y:%H:%M:%S')

# Assumes $upstream_cache_status is in the access log format ($cs)
read TOTAL HIT < <(awk -v since="$SINCE" '
  $4 >= "["since {
    t++
    if ($NF == "HIT") h++
  }
  END { print t+0, h+0 }
' "$LOG")

[ "$TOTAL" -lt 50 ] && exit 0          # too few requests to judge

PCT=$((HIT * 100 / TOTAL))
if [ "$PCT" -lt "$MIN_HITS_PCT" ]; then
  curl -fsS "$HEARTBEAT_URL" --data "hit_pct=$PCT,window=60s,total=$TOTAL"
  exit 2
fi
echo "OK (${PCT}% hit / 60s)"

Same thing in Enterno.io

Wrap in an Enterno heartbeat — catch a cache miss within 60 s after a new nginx config deploys, before the backend burns out.

Set up HTTP monitor → ← All recipes

Related recipes

The server starts returning 503/504 — but a plain uptime check misses it because the homepage is 200 while the API path is on fire.

long_query_time = 1, slow_query_log enabled. You need to know when the slow-query rate per minute suddenly jumps (a deploy broke an index, ORM went N+1).