SQS — alert when the dead-letter queue grows
Main SQS queue is processing fine, but the DLQ silently grows — some messages fail 3 attempts and end up there. Nobody looks at the DLQ until it's a thousand deep.
Recipe
#!/usr/bin/env bash
# /etc/cron.d/sqs-dlq
# */5 * * * * root /opt/sqs-dlq.sh
QUEUE=${DLQ_URL}
REGION=${AWS_REGION:-us-east-1}
THRESH=${THRESH:-10} # alert above N messages
DEPTH=$(aws sqs get-queue-attributes \
--region "$REGION" \
--queue-url "$QUEUE" \
--attribute-names ApproximateNumberOfMessages \
--output text --query 'Attributes.ApproximateNumberOfMessages')
if [ "${DEPTH:-0}" -gt "$THRESH" ]; then
curl -fsS "$HEARTBEAT_URL" --data "dlq=$DEPTH,threshold=$THRESH"
exit 2
fi
echo "OK (DLQ=$DEPTH)"
Same thing in Enterno.io
Wire to an Enterno heartbeat — see the daily pattern (DLQ drains at night when someone manually replays) or a sudden jump at release time.
Related recipes
A release bumped the bundle size and p99 cold-start went from 800ms to 3s. The metric is in CloudWatch, but nobody’s watching. Want a heartbeat-style alert.
A CloudFront distribution started serving 5xx 4 % of the time — far-region clients see broken pages. CloudWatch graph exists; dashboard goes unwatched.
DynamoDB starts throttling (hot key or low write capacity) — your app gets ProvisionedThroughputExceededException, but AWS alarms only fire on a 5-minute aggregate.