Перейти к содержимому
Skip to content
← All articles

Cron Job Monitoring with Dead Man's Switch

The Problem of Silent Failures

Cron jobs run in the background: processing payments, sending emails, generating reports, cleaning up data, running backups. When a cron job silently crashes or hangs, you find out hours, days, or even weeks later — when customers start complaining.

Cron job monitoring isn't checking "is the server running" but rather "did the task execute at the expected time."

Dead Man's Switch — How It Works

A Dead Man's Switch (DMS) is a monitoring pattern that inverts the usual logic. Instead of checking "is the service available?" it checks "has the service stopped reporting?"

How It Works

  1. You create a monitor with an expected interval (e.g., "cron should report every hour")
  2. After successful execution, the cron job sends an HTTP request (Ping/heartbeat) to the monitoring URL
  3. If no heartbeat is received within the expected time + grace period, an alert fires

Implementation

# Crontab: hourly job with heartbeat
0 * * * * /path/to/job.sh && curl -fsS https://monitor.example.com/ping/abc123 > /dev/null

Key point: && means the heartbeat is sent only on successful completion (exit code 0). If the job fails, no heartbeat is sent, and the alert fires.

What to Monitor in Cron Jobs

Scheduled Execution

The task should run at the expected time. DMS detects missed runs — if the cron daemon stopped, the server rebooted, or the crontab was corrupted.

Successful Completion

A task may start but fail with an error. Check the exit code and send the heartbeat only on success.

Execution Time

If a task normally takes 5 minutes but suddenly took 2 hours, that's a sign of trouble (growing database, deadlock, memory leak).

START=$(date +%s)
/path/to/job.sh
END=$(date +%s)
DURATION=$((END - START))

if [ $? -eq 0 ]; then
    curl -fsS "https://monitor.example.com/ping/abc123?duration=$DURATION"
fi

Hanging

A task may hang (deadlock, infinite loop). Use timeout:

timeout 3600 /path/to/job.sh

If the task doesn't complete within an hour, timeout kills the process and returns a non-zero exit code.

Overlapping Runs

If the previous run hasn't finished before the next starts, overlap occurs. Use lock files:

#!/bin/bash
LOCKFILE="/tmp/job.lock"
if [ -f "$LOCKFILE" ]; then
    echo "Job already running, exiting"
    exit 1
fi
trap "rm -f $LOCKFILE" EXIT
touch "$LOCKFILE"

# Main job logic
/path/to/actual-job.sh

Best Practices

Logging

Every cron job should write logs: start time, end time, records processed, errors. Without logs, diagnosis is impossible.

# Crontab with logging
0 * * * * /path/to/job.sh >> /var/log/jobs/hourly-job.log 2>&1

Grace Period

Don't alert immediately on a missed heartbeat. Add a grace period — extra time accounting for normal execution duration variance. For an hourly task, use a 10-15 minute grace period.

Separating Alerts by Severity

Documentation

For each cron job, document:

Monitoring with Enterno.io

Set up the Heartbeat Monitor on Enterno.io for cron job monitoring. Create a monitor for each critical task, specify the expected interval and grace period. After task completion, send an HTTP request to the monitoring URL.

Use the monitors dashboard to view the status of all cron jobs in one place.

Summary

Cron jobs are an invisible but critically important part of infrastructure. Dead Man's Switch is the optimal pattern for monitoring them: the task reports its own health, and the absence of a report signals a problem. Monitor not just execution but duration, use timeouts and lock files, and log everything.

Check your website right now

Check now →
More articles: Monitoring
Monitoring
Synthetic Monitoring vs Real User Monitoring (RUM)
14.03.2026 · 12 views
Monitoring
Incident Response Plan: A Step-by-Step Guide for Web Teams
16.03.2026 · 11 views
Monitoring
Website Monitoring — Why You Need It and How to Set It Up
12.03.2026 · 11 views
Monitoring
Domain and Website Monitoring: Why and How to Set It Up
11.03.2026 · 15 views