Cron Job Monitoring with Dead Man's Switch

Anatoly Oshmanovsky

Monitoring

Cron Job Monitoring with Dead Man's Switch

Published: 14.03.2026 · 4 min read · 104 views

The Problem of Silent Failures

Cron jobs run in the background: processing payments, sending emails, generating reports, cleaning up data, running backups. When a cron job silently crashes or hangs, you find out hours, days, or even weeks later — when customers start complaining.

Cron job monitoring isn't checking "is the server running" but rather "did the task execute at the expected time."

Dead Man's Switch — How It Works

A Dead Man's Switch (DMS) is a monitoring pattern that inverts the usual logic. Instead of checking "is the service available?" it checks "has the service stopped reporting?"

How It Works

You create a monitor with an expected interval (e.g., "cron should report every hour")
After successful execution, the cron job sends an HTTP request (Ping/heartbeat) to the monitoring URL
If no heartbeat is received within the expected time + grace period, an alert fires

Implementation

# Crontab: hourly job with heartbeat
0 * * * * /path/to/job.sh && curl -fsS https://monitor.example.com/ping/abc123 > /dev/null

Key point: && means the heartbeat is sent only on successful completion (exit code 0). If the job fails, no heartbeat is sent, and the alert fires.

What to Monitor in Cron Jobs

Scheduled Execution

The task should run at the expected time. DMS detects missed runs — if the cron daemon stopped, the server rebooted, or the crontab was corrupted.

Successful Completion

A task may start but fail with an error. Check the exit code and send the heartbeat only on success.

Execution Time

If a task normally takes 5 minutes but suddenly took 2 hours, that's a sign of trouble (growing database, deadlock, memory leak).

START=$(date +%s)
/path/to/job.sh
END=$(date +%s)
DURATION=$((END - START))

if [ $? -eq 0 ]; then
    curl -fsS "https://monitor.example.com/ping/abc123?duration=$DURATION"
fi

Hanging

A task may hang (deadlock, infinite loop). Use timeout:

timeout 3600 /path/to/job.sh

If the task doesn't complete within an hour, timeout kills the process and returns a non-zero exit code.

Overlapping Runs

If the previous run hasn't finished before the next starts, overlap occurs. Use lock files:

#!/bin/bash
LOCKFILE="/tmp/job.lock"
if [ -f "$LOCKFILE" ]; then
    echo "Job already running, exiting"
    exit 1
fi
trap "rm -f $LOCKFILE" EXIT
touch "$LOCKFILE"

# Main job logic
/path/to/actual-job.sh

Best Practices

Logging

Every cron job should write logs: start time, end time, records processed, errors. Without logs, diagnosis is impossible.

# Crontab with logging
0 * * * * /path/to/job.sh >> /var/log/jobs/hourly-job.log 2>&1

Grace Period

Don't alert immediately on a missed heartbeat. Add a grace period — extra time accounting for normal execution duration variance. For an hourly task, use a 10-15 minute grace period.

Separating Alerts by Severity

Critical: payment processing, backups, order fulfillment
Warning: report generation, old data cleanup, cache updates
Info: statistics, analytics, non-essential notifications

Documentation

For each cron job, document:

What the task does
Schedule and expected execution time
What to do on failure (runbook)
Dependencies (database, external API документацию, files)

Monitoring with Enterno.io

Set up the Heartbeat Monitor on Enterno.io for cron job monitoring. Create a monitor for each critical task, specify the expected interval and grace period. After task completion, send an HTTP request to the monitoring URL.

Use the monitors dashboard to view the status of all cron jobs in one place.

Summary

Cron jobs are an invisible but critically important part of infrastructure. Dead Man's Switch is the optimal pattern for monitoring them: the task reports its own health, and the absence of a report signals a problem. Monitor not just execution but duration, use timeouts and lock files, and log everything.

Check your website right now

Check now →

Cron Job Monitoring with Dead Man's Switch

The Problem of Silent Failures

Dead Man's Switch — How It Works

How It Works

Implementation

What to Monitor in Cron Jobs

Scheduled Execution

Successful Completion

Execution Time

Hanging

Overlapping Runs

Best Practices

Logging

Grace Period

Separating Alerts by Severity

Documentation

Monitoring with Enterno.io

Summary

Start monitoring for free