Skip to content

Airflow — alert when a DAG misses its SLA

An Airflow DAG finished past its SLA (but did not fail — late success). By default an SLA miss only triggers an email callback that is rarely configured. The pipeline shows a "red flag" an hour after the fact.

Recipe

bash
#!/usr/bin/env bash
# /etc/cron.d/af-sla
# */15 * * * * root /opt/af-sla.sh

AF_URL=${AIRFLOW_URL:-http://airflow:8080/api/v1}
USER=${AF_USER}
PWD=${AF_PWD}
DAG=${DAG_ID}
SLA_MIN=${SLA_MIN:-90}                # expected runtime in minutes

# Latest dag run
RUN=$(curl -fsS -u "$USER:$PWD" "$AF_URL/dags/$DAG/dagRuns?limit=1&order_by=-execution_date" \
  | jq -r '.dag_runs[0]')

START=$(echo "$RUN" | jq -r .start_date)
END=$(echo "$RUN"   | jq -r .end_date)
STATE=$(echo "$RUN" | jq -r .state)

[ "$END" = "null" ] && exit 0          # still running, ok

DUR_MIN=$(( ($(date -ud "$END" +%s) - $(date -ud "$START" +%s)) / 60 ))

if [ "$DUR_MIN" -gt "$SLA_MIN" ]; then
  curl -fsS "$HEARTBEAT_URL" --data "dag=$DAG,duration_min=$DUR_MIN,sla_min=$SLA_MIN,state=$STATE"
  exit 2
fi
echo "OK (duration=${DUR_MIN}m, state=$STATE)"

Same thing in Enterno.io

Wire to an Enterno heartbeat — watch the trend "DAG is degrading" (70 min yesterday, 95 min today) before the SLA breaks.

Set up API monitor → ← All recipes

Related recipes