Amazon ECR — alert on a pull-failure climb · monitoring cookbook · Enterno.io

Enterno.io Editorial Team

Amazon ECR — alert on a pull-failure climb

ECR pulls start failing consistently (IRSA expired, network ACL, repo policy mismatch) — pods in k8s cannot start, ImagePullBackOff. But the kubelet event pages nobody.

Stack: aws-cli · cron Tags: aws, ecr, registry

Recipe

bash

#!/usr/bin/env bash
# /etc/cron.d/ecr-pull
# */5 * * * * root /opt/ecr-pull.sh

CONTEXT=${KUBE_CONTEXT:-prod}

# Count ImagePullBackOff pods cluster-wide
COUNT=$(kubectl --context "$CONTEXT" get pods -A -o json \
  | jq '[.items[] |
         select(.status.containerStatuses // []
                | map(.state.waiting.reason // "")
                | any(. == "ImagePullBackOff" or . == "ErrImagePull"))]
        | length')

if [ "${COUNT:-0}" -gt 0 ]; then
  EXAMPLES=$(kubectl --context "$CONTEXT" get pods -A \
    --field-selector status.phase=Pending --no-headers \
    | awk '/ImagePullBackOff|ErrImagePull/ {printf "%s/%s,", $1, $2}' | head -c 250)
  curl -fsS "$HEARTBEAT_URL" --data-urlencode "imgpull_fail=$COUNT,examples=$EXAMPLES"
  exit 2
fi
echo "OK (no ImagePullBackOff)"

Same thing in Enterno.io

Wrap in an Enterno heartbeat — "pull-fail for image-X" usually means a protected policy / expired credential; catch and fix before 30 % of deploys break.

Set up API monitor → ← All recipes

Related recipes

AWS Lambda — alert when p99 cold-start regresses

bash

A release bumped the bundle size and p99 cold-start went from 800ms to 3s. The metric is in CloudWatch, but nobody’s watching. Want a heartbeat-style alert.

CloudFront — alert on rising 5xx error rate

bash

A CloudFront distribution started serving 5xx 4 % of the time — far-region clients see broken pages. CloudWatch graph exists; dashboard goes unwatched.

DynamoDB — alert on throttled-request rate

bash

DynamoDB starts throttling (hot key or low write capacity) — your app gets ProvisionedThroughputExceededException, but AWS alarms only fire on a 5-minute aggregate.

Recipe

Same thing in Enterno.io

Related recipes

AWS Lambda — alert when p99 cold-start regresses

CloudFront — alert on rising 5xx error rate

DynamoDB — alert on throttled-request rate

Start monitoring for free