Skip to content

S3 — alert on a 5xx error-rate climb for a bucket

S3 endpoint starts 5xx-ing — your app gets random failures on upload. AWS Health shows 'healthy', the CloudWatch alarm is on a 5-min aggregate — reaction is late.

Recipe

bash
#!/usr/bin/env bash
# /etc/cron.d/s3-5xx
# */1 * * * * root /opt/s3-5xx.sh

BUCKET=${S3_BUCKET}
REGION=${AWS_REGION:-us-east-1}
THRESH=${THRESH:-1}                   # 5xx events / 60 s

COUNT=$(aws cloudwatch get-metric-statistics \
  --region "$REGION" \
  --namespace AWS/S3 \
  --metric-name 5xxErrors \
  --dimensions Name=BucketName,Value="$BUCKET" Name=FilterId,Value=EntireBucket \
  --statistics Sum \
  --start-time "$(date -u -d '60 seconds ago' --iso-8601=seconds)" \
  --end-time   "$(date -u --iso-8601=seconds)" \
  --period 60 \
  --output text --query 'Datapoints[0].Sum')

COUNT=${COUNT:-0}
COUNT=${COUNT%.*}

if [ "$COUNT" -gt "$THRESH" ]; then
  curl -fsS "$HEARTBEAT_URL" --data "s3_5xx=$COUNT,bucket=$BUCKET"
  exit 2
fi
echo "OK ($COUNT 5xx / 60s)"

Same thing in Enterno.io

Replace the CloudWatch alarm → SNS chain with an Enterno heartbeat on a 1-minute schedule — 60-s reaction and one unified dashboard with the rest of your services.

Set up HTTP monitor → ← All recipes

Related recipes