Airflow — alert when a DAG misses its SLA
An Airflow DAG finished past its SLA (but did not fail — late success). By default an SLA miss only triggers an email callback that is rarely configured. The pipeline shows a "red flag" an hour after the fact.
Recipe
#!/usr/bin/env bash
# /etc/cron.d/af-sla
# */15 * * * * root /opt/af-sla.sh
AF_URL=${AIRFLOW_URL:-http://airflow:8080/api/v1}
USER=${AF_USER}
PWD=${AF_PWD}
DAG=${DAG_ID}
SLA_MIN=${SLA_MIN:-90} # expected runtime in minutes
# Latest dag run
RUN=$(curl -fsS -u "$USER:$PWD" "$AF_URL/dags/$DAG/dagRuns?limit=1&order_by=-execution_date" \
| jq -r '.dag_runs[0]')
START=$(echo "$RUN" | jq -r .start_date)
END=$(echo "$RUN" | jq -r .end_date)
STATE=$(echo "$RUN" | jq -r .state)
[ "$END" = "null" ] && exit 0 # still running, ok
DUR_MIN=$(( ($(date -ud "$END" +%s) - $(date -ud "$START" +%s)) / 60 ))
if [ "$DUR_MIN" -gt "$SLA_MIN" ]; then
curl -fsS "$HEARTBEAT_URL" --data "dag=$DAG,duration_min=$DUR_MIN,sla_min=$SLA_MIN,state=$STATE"
exit 2
fi
echo "OK (duration=${DUR_MIN}m, state=$STATE)"
Same thing in Enterno.io
Wire to an Enterno heartbeat — watch the trend "DAG is degrading" (70 min yesterday, 95 min today) before the SLA breaks.
Related recipes
cron is alive, but the job (timer disabled, MAILTO=root spam, sh-syntax-error in crontab) did not run last night. The classic "we forgot last night gave empty reports".
Someone set `spec.suspend: true` on a CronJob (debug or rushed release) and forgot to revert. The daily task does not run, reports are not generated — you only learn when finance asks.
Ensure your site returns 2xx every minute, alert to Slack/Telegram on failure.