systemd — alert on any failed unit on the host
One of several systemd units (cron, redis, postfix) crashed at 3 AM and never came back. Learned about it from complaints. Want an endpoint showing high whenever any unit is failed.
Recipe
#!/usr/bin/env bash
# Wraps `systemctl --failed` for an HTTP-monitor probe.
FAILED=$(systemctl --failed --no-legend --plain --quiet 2>/dev/null | wc -l)
[ "$FAILED" -gt 0 ] && {
NAMES=$(systemctl --failed --no-legend --plain | awk '{print $1}' | head -5 | tr '\n' ' ')
echo "high $FAILED failed: $NAMES"
exit 1
}
echo "ok"
Same thing in Enterno.io
Expose the endpoint via nginx or a mini-server, point an Enterno HTTP monitor at it with "ok" keyword — pager-grade monitoring for every systemd unit in one line. Cheaper than any APM.
Related recipes
Ensure your site returns 2xx every minute, alert to Slack/Telegram on failure.
Minimal script that checks an SSL certificate and alerts 14 days before expiry.
Detect the moment a replica falls behind the primary by more than 10 seconds.