SMTP — alert on a postfix bounce-rate climb
Postfix starts bouncing chunks of mail (the sending domain just lost reputation), but you only learn from a support ticket: 'no email arrived'.
Recipe
#!/usr/bin/env bash
# /etc/cron.d/smtp-bounce-rate
# */5 * * * * root /opt/smtp-bounce.sh
LOG=${LOG:-/var/log/mail.log}
WINDOW=300 # last 5 min
BOUNCE_PCT=${BOUNCE_PCT:-3} # alert above 3 % bounces
SINCE=$(date -d "-${WINDOW} seconds" '+%b %_d %H:%M')
read SENT BOUNCED < <(awk -v since="$SINCE" '
$0 >= since {
if (/status=sent/) s++
if (/status=bounced|status=deferred/) b++
}
END { print s+0, b+0 }
' "$LOG")
TOTAL=$((SENT + BOUNCED))
[ "$TOTAL" -lt 50 ] && exit 0 # not enough volume to judge
PCT=$((BOUNCED * 100 / TOTAL))
if [ "$PCT" -gt "$BOUNCE_PCT" ]; then
echo "smtp-bounce: $BOUNCED / $TOTAL = ${PCT}% (threshold ${BOUNCE_PCT}%)"
curl -fsS "$HEARTBEAT_URL" --data "bounce=${PCT}%"
exit 2
fi
echo "OK (${PCT}% bounces)"
Same thing in Enterno.io
Plug DMARC RUA reports into Enterno DMARC — see not just local bounces but provider reports (Gmail, Yahoo) with the exact failure reason (DKIM/SPF/policy).
Related recipes
A junior marketer flips DMARC from <code>p=quarantine</code> to <code>p=none</code> "to fix bounces" — Gmail starts marking everything as spam an hour later.
Apache starts returning 502/503 from one backend but not all. Want an endpoint with the 5xx ratio over the last 60 s.
Sudden ban spike in fail2ban — credential stuffing or enumeration campaign. I want to know within the first 5 minutes, not the morning after.