Cron Job Monitoring: Dead Man's Switch

Anatoly Oshmanovsky

Monitoring

Cron Job Monitoring: Dead Man's Switch

Published: 15.06.2026 · ~6art.read_time_min · 12 views

Cron Job Monitoring: Dead Man's Switch

Short answer. A cron job can die silently — the server rebooted, the script crashed, the disk filled up — and you only find out when a business process is already broken. A dead man's switch (heartbeat monitoring) flips the logic: the job itself must send a Ping after it finishes successfully. No ping within the expected window means the system raises an alert. You stop monitoring whether a site is reachable and start monitoring whether the background work actually ran.

Why ordinary monitoring can't see cron jobs

Classic uptime monitoring probes an HTTP endpoint from the outside: the request goes out, a response comes back, status 200 — all good. But cron jobs don't listen on a port. A database backup, a queue cleanup, report delivery, log rotation — none of them have a URL you can poll. They either ran or they didn't, and the only one who knows is the server itself.

The trap is that the absence of an error is not proof of success. If the crontab never fired at all — the cron daemon stopped, the file lost its execute bit, an environment variable vanished after a migration — there's no error email either. Silence looks exactly like normal.

The most dangerous failure isn't the one screaming in your logs. It's the one that stays quiet. A backup that hasn't run in three weeks gets discovered the moment the database dies and there's nothing left to restore from.

How a dead man's switch works

The idea is inversion. Instead of "poll me and check I'm alive," the job says "I'll check in myself once I'm done." The mechanics are simple:

You create a heartbeat monitor and get a unique URL with a token.
At the end of the script — only after a successful run — an HTTP ping is sent to that URL.
The monitor knows the expected period (say, every 5 minutes) and a grace period (slack for late arrivals).
If no ping lands within the "period + grace" window, an incident opens and an alert fires.

The key nuance: the ping goes last and is tied to the exit code. If the job fails halfway, the ping never goes out — and that's correct, the monitor should trip.

In practice: adding a heartbeat to crontab

The most reliable pattern is to run the work, gate on success with a logical &&, and only then ping. That way the ping physically cannot be sent if the previous command returned a non-zero exit code:

*/5 * * * * /path/job.sh && curl -fsS https://enterno.io/api/heartbeat?token=XXX

The curl flags matter: -f stays quiet on an HTTP error, -s hides the progress bar, and -S still surfaces a network error in the email cron sends you.

If you need to distinguish start from finish (for long-running jobs) and report failures explicitly, wrap it in a shell script:

#!/usr/bin/env bash
set -euo pipefail
PING="https://enterno.io/api/heartbeat?token=XXX"

# "started" signal (optional)
curl -fsS "${PING}/start" || true

if /path/to/backup.sh; then
  curl -fsS "$PING"          # success
else
  curl -fsS "${PING}/fail"   # explicit failure
  exit 1
fi

Choosing the period and grace

The period is how often the job is required to check in. The grace is how late it's allowed to be before it's declared dead. Too tight a grace produces false alarms (the job occasionally runs 10 seconds longer); too loose, and real failures take ages to surface.

Job type	Period	Suggested grace
Queue cleanup, real-time	5 min	2-3 min
Data sync	1 hour	15 min
Nightly DB backup	1 day	2-4 hours
Weekly report	1 week	1 day

Rule of thumb: grace ≈ the job's normal runtime plus headroom for peak load. If a backup usually takes 40 minutes but balloons to 90 at month-end, set the grace to at least 2 hours.

Alerts: where the silence lands

A heartbeat is useless if the missed-ping notification goes nowhere. In enterno.io alerts fan out across several channels, so one broken channel doesn't bury the alarm:

Telegram — instant, with a button straight to the dashboard.
Slack — into the on-call team channel.
webhook — to wire into your own incident management.
Email — as a fallback.

For how to build sane escalation without drowning in noise, see alerting best practices.

Common mistakes

Pinging before the work runs. If curl sits at the front of the line, the monitor stays green even when the job itself crashes. Always && and end-of-script.
Token in a public repo. The URL with the token is a secret. Keep it in an environment variable or vault, never commit it to git.
One monitor for a pile of different jobs. Every cron line deserves its own heartbeat; otherwise you can't tell which one died.
Grace cut to the bone. Network jitter and peak load are normal. Leave headroom.

FAQ

How is a dead man's switch different from regular uptime monitoring?

Uptime monitoring polls your server from the outside. A dead man's switch waits for the job to check in from the inside. The first fits sites and API документацию; the second fits background jobs that have no public endpoint. See the full comparison in the cron job monitoring guide.

How many heartbeat monitors does the free plan include?

The enterno.io free plan gives you 10 monitors, and any of them can be configured as a heartbeat. That's enough for the typical set of background jobs in a small project.

What if the job runs on an irregular schedule?

For jobs with a floating schedule, set the period to the longest expected interval and a generous grace. If a job might legitimately not run on a given day by design, a heartbeat isn't the right tool — log the result instead.

Can I ping from any language, not just curl?

Yes. The heartbeat endpoint is a plain HTTP request. requests in Python, fetch in Node, HttpClient in .NET — anything that can fire a GET with your token works.

How do I avoid false alarms during deploys?

Pause the monitor during planned maintenance. That's more honest than widening the grace "just in case" and then sleeping through a real failure.

Start monitoring cron right now

Spin up your first dead man's switch in a couple of minutes: open the Heartbeat / Cron monitor, grab a token, and drop one curl line into your crontab. Then set the period, grace, and alert channels — and silence in your background jobs stops being a threat.

If you're building monitoring from scratch, start with the website monitoring guide and the uptime monitoring guide — heartbeats complement external availability checks nicely.

Check your website right now

Check now →

Cron Job Monitoring: Dead Man's Switch

Cron Job Monitoring: Dead Man's Switch

Why ordinary monitoring can't see cron jobs

How a dead man's switch works

In practice: adding a heartbeat to crontab

Choosing the period and grace

Alerts: where the silence lands

Common mistakes

FAQ

How is a dead man's switch different from regular uptime monitoring?

How many heartbeat monitors does the free plan include?

What if the job runs on an irregular schedule?

Can I ping from any language, not just curl?

How do I avoid false alarms during deploys?

Start monitoring cron right now

Start monitoring for free