S3 — alert on a 5xx error-rate climb for a bucket
S3 endpoint starts 5xx-ing — your app gets random failures on upload. AWS Health shows 'healthy', the CloudWatch alarm is on a 5-min aggregate — reaction is late.
Recipe
#!/usr/bin/env bash
# /etc/cron.d/s3-5xx
# */1 * * * * root /opt/s3-5xx.sh
BUCKET=${S3_BUCKET}
REGION=${AWS_REGION:-us-east-1}
THRESH=${THRESH:-1} # 5xx events / 60 s
COUNT=$(aws cloudwatch get-metric-statistics \
--region "$REGION" \
--namespace AWS/S3 \
--metric-name 5xxErrors \
--dimensions Name=BucketName,Value="$BUCKET" Name=FilterId,Value=EntireBucket \
--statistics Sum \
--start-time "$(date -u -d '60 seconds ago' --iso-8601=seconds)" \
--end-time "$(date -u --iso-8601=seconds)" \
--period 60 \
--output text --query 'Datapoints[0].Sum')
COUNT=${COUNT:-0}
COUNT=${COUNT%.*}
if [ "$COUNT" -gt "$THRESH" ]; then
curl -fsS "$HEARTBEAT_URL" --data "s3_5xx=$COUNT,bucket=$BUCKET"
exit 2
fi
echo "OK ($COUNT 5xx / 60s)"
Same thing in Enterno.io
Replace the CloudWatch alarm → SNS chain with an Enterno heartbeat on a 1-minute schedule — 60-s reaction and one unified dashboard with the rest of your services.
Related recipes
Backup cron silently failed; nobody noticed; the gap surfaces only at the next incident. Need an alert when the newest backup file is older than 30 hours.
A release bumped the bundle size and p99 cold-start went from 800ms to 3s. The metric is in CloudWatch, but nobody’s watching. Want a heartbeat-style alert.
A CloudFront distribution started serving 5xx 4 % of the time — far-region clients see broken pages. CloudWatch graph exists; dashboard goes unwatched.