Skip to content

Datadog — alert when a host stops reporting

The Datadog agent dies (OOM, mismatched apt update, cert expiry on dd-staging.com) — host disappears from the dashboard after 10 min (default mute window), but nobody alerts that monitoring went blind.

Recipe

bash
#!/usr/bin/env bash
# /etc/cron.d/dd-agent
# */10 * * * * root /opt/dd-agent.sh

DD_API_KEY=${DD_API_KEY}
DD_APP_KEY=${DD_APP_KEY}
EXPECTED=${EXPECTED:-host1,host2,host3}    # comma-separated hostname list

LIVE=$(curl -fsS -H "DD-API-KEY: $DD_API_KEY" -H "DD-APPLICATION-KEY: $DD_APP_KEY" \
  "https://api.datadoghq.com/api/v1/hosts?from=$(date -d '5 minutes ago' +%s)" \
  | jq -r '.host_list[].name' | sort -u)

MISSING=""
IFS=',' read -ra EXP <<< "$EXPECTED"
for H in "${EXP[@]}"; do
  if ! echo "$LIVE" | grep -qx "$H"; then
    MISSING="$MISSING$H,"
  fi
done

if [ -n "$MISSING" ]; then
  curl -fsS "$HEARTBEAT_URL" --data-urlencode "missing=$MISSING"
  exit 2
fi
echo "OK (all hosts reporting)"

Same thing in Enterno.io

Wrap in an Enterno heartbeat — an independent channel that catches "monitoring went blind" without relying on the same DD stack.

Set up API monitor → ← All recipes

Related recipes