kubelet — alert when a node goes NotReady
A node goes NotReady (kubelet stopped pinging the apiserver, runtime is sick) — pods on it linger like zombies until a taint evicts them. Kubernetes events do not go to Slack by default.
Recipe
#!/usr/bin/env bash
# /etc/cron.d/kubelet-notready
# */2 * * * * root /opt/kubelet-notready.sh
CONTEXT=${KUBE_CONTEXT:-prod}
# Count nodes whose Ready condition isn't True
NOTREADY=$(kubectl --context "$CONTEXT" get nodes -o json \
| jq '[.items[] | select(.status.conditions[] | select(.type=="Ready") | .status != "True")] | length')
if [ "${NOTREADY:-0}" -gt 0 ]; then
NAMES=$(kubectl --context "$CONTEXT" get nodes \
--no-headers | awk '$2!="Ready" {printf "%s,", $1}' | sed 's/,$//')
curl -fsS "$HEARTBEAT_URL" --data "notready=$NOTREADY,nodes=$NAMES"
exit 2
fi
echo "OK (all nodes Ready)"
Same thing in Enterno.io
Wrap in an Enterno heartbeat — alert straight to Telegram plus history "node1 went NotReady 3 times this week" that points to a hardware issue (not transient).
Related recipes
Readiness probes pass inside the pod, but no one sees that the LB refused to route traffic to the new deploy.
A CrashLoopBackOff in one namespace — kubectl shows a restart count of 47, but nobody sees it. Want an endpoint that returns high when the counter jumps.
Inside a K8s cluster etcd re-elects the leader every 30 s — kube-apiserver lags, controller-manager can't keep reconciling. Only visible in etcd metrics.