Skip to content

Heartbeat monitor

Key idea:

A heartbeat monitor is the inverse of uptime monitoring: instead of probing your service from outside, your service pings the monitor every N minutes. If the ping does not arrive in time → alert. Use for cron jobs, background workers, ETL pipelines, backup scripts. Sometimes called a dead-man's switch.

Below: details, example, related terms, FAQ.

Try it now — free →

Details

  • Scenario: daily backup at 03:00. Script POSTs a heartbeat at the end. Nothing by 03:15 → alert
  • Advantage over uptime: catches "silent" failures (cron did not run, worker hung — no HTTP error to probe)
  • Schedule expression: cron syntax ("0 3 * * *") + grace period (15 min tolerance)
  • Endpoint: POST /api/heartbeat?token=X — empty body, instant 200 OK
  • Alert channels: email, Telegram, Slack, webhook — on the monitor service side

Example

# Bash script with a final heartbeat
#!/bin/bash
set -e

# Run the backup
mysqldump db > /backup/db-$(date +%F).sql
rsync /backup s3:bucket/

# Got here → all good, send heartbeat
curl -s "https://enterno.io/api/heartbeat?token=$HEARTBEAT_TOKEN&status=ok"

# Python (with error handling)
import requests, sys
try:
    do_backup()
    requests.get('https://enterno.io/api/heartbeat',
                 params={'token': TOKEN, 'status': 'ok'})
except Exception as e:
    requests.get('https://enterno.io/api/heartbeat',
                 params={'token': TOKEN, 'status': 'critical',
                         'msg': str(e)[:200]})
    sys.exit(1)

Related

TL;DR

A heartbeat monitor, often referred to as a dead-man's switch, is a critical system component designed to ensure operational integrity by sending regular signals to indicate that a system or process is functioning correctly. If the heartbeat signal fails to arrive within a specified timeframe, the system triggers a predefined response, such as alerting administrators or initiating failover protocols. This mechanism is essential in environments where uptime and reliability are paramount, such as server management and industrial automation.

Understanding Heartbeat Monitors

A heartbeat monitor operates by sending periodic signals at defined intervals to confirm that a system or process is active. This monitoring technique is widely used in various applications, including IT infrastructure, safety systems, and industrial processes. The underlying principle is straightforward: if the system fails to send a heartbeat signal within the expected timeframe, it indicates a potential failure or malfunction.

For example, in a server environment, a heartbeat monitor might be configured to send a signal every 30 seconds. If the server does not respond within this interval, the monitor triggers an alert to system administrators and may even initiate automatic recovery processes, such as restarting the server or switching to a backup system.

Heartbeat monitors can be implemented using various methods, including:

  • Network Pings: Sending ICMP echo requests to verify connectivity and responsiveness of a system.
  • Application-Level Health Checks: Utilizing application-specific endpoints to confirm operational status.
  • Custom Scripts: Running scripts that check system metrics, logs, or conditions.

In terms of configuration, a typical command for setting up a heartbeat monitor might look like this in a Linux environment:

*/30 * * * * /usr/local/bin/heartbeat-check.sh

This command, when placed in a cron job, executes the heartbeat-check.sh script every 30 minutes. The script could contain logic to verify system health and send alerts if a heartbeat is not detected.

Practical Applications and Use Cases

Heartbeat monitors find applications across various domains, primarily in IT infrastructure, safety-critical systems, and industrial automation. Their ability to detect failures promptly can significantly minimize downtime and mitigate potential risks. Below are some notable use cases where heartbeat monitors play a vital role:

1. IT Infrastructure Monitoring

In cloud environments, heartbeat monitors are crucial for maintaining service availability. For instance, cloud providers like AWS utilize heartbeat signals to monitor the health of EC2 instances. If an instance fails to send a heartbeat, AWS can automatically initiate a recovery process or alert the user.

2. Safety-Critical Systems

In industrial settings, such as manufacturing plants, heartbeat monitors are implemented to ensure the safety of machinery and personnel. For example, a manufacturing robot may use a heartbeat signal to confirm its operational status. If the heartbeat is not detected, the system can enter a safe state to prevent accidents.

3. Web Application Health Checks

Web applications often implement heartbeat monitors to ensure that essential services are running. For example, a web application could use a simple HTTP GET request to a health check endpoint. If the response time exceeds a certain threshold or if the service is down, automated systems can trigger alerts or restart the service.

In a practical scenario, consider a web application deployed in a Kubernetes cluster. You can define a liveness probe in your deployment YAML file to act as a heartbeat monitor:

apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: my-app-image
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 15

This configuration checks the health of the application every 15 seconds after an initial delay of 30 seconds. If the application fails to respond, Kubernetes can automatically restart the pod, ensuring minimal disruption.

In conclusion, heartbeat monitors serve as an essential component in various systems, enhancing reliability and safety. By implementing heartbeat monitoring strategies, organizations can proactively manage potential failures, ensuring continuous operation and safeguarding critical processes.

Dead man's switchAlert when job goes silent
Flexible Grace PeriodAllowed ping latency window
REST API PingSingle GET confirms liveness
Cron + CI + ScriptsFor any periodic task

Why teams trust us

1min
min interval
Email
Telegram + Email alerts
HTTP
ping endpoint
Scout
10 monitors free

How it works

1

Create heartbeat

2

Ping URL from cron

3

Get alert on miss

What is Heartbeat Monitoring?

A heartbeat monitor is a "reverse monitor": instead of us polling the service, the service signals us that it's alive. If no signal arrives within the set interval — we send an alert.

Simple Integration

One GET request to a unique URL — and the monitor knows the job completed.

Grace Period

Set an acceptable ping delay to avoid false alerts.

Smart Notifications

Email and Telegram on missed ping. Repeated alert if silence continues.

Execution History

Full ping log with timestamps — see every job execution.

Who uses this

DevOps

cron job monitoring

Developers

background worker check

Sysadmins

dead man's switch

Business

payment queue monitoring

Common Mistakes

No grace periodWithout grace period, any minor delay triggers a false alert.
Pinging before task startsPing at the end of the task — it confirms successful completion, not just start.
One URL for different tasksCreate a separate monitor for each cron job — otherwise you won't know which one failed.
Not pinging on errorIf the task fails — don't ping. Missing ping = failure signal.

Best Practices

Ping at the very endMake the heartbeat URL call the last command in the script.
Use curl in croncurl -s https://enterno.io/api/heartbeat/TOKEN — simple and reliable.
Set grace = 20–30%If the job takes 5 min, grace period = 1–2 min on top.
Cover all critical jobsBackups, report generation, data sync — all should have a heartbeat monitor.

Start monitoring cron for free

Heartbeat monitor: 5 tasks free, Telegram and email alerts on missed runs.

Sign up free

Learn more

Frequently Asked Questions

How is this different from uptime monitoring?

Uptime — external probe of a service (HTTP GET). Heartbeat — internal ping from YOUR code. Catches bugs uptime misses (cron did not start — there is no HTTP server to probe).

What if the grace period is too tight?

Alerts fire on slow runs. Give 20-30% buffer over a normal run's duration. Better to miss a 1-second glitch than chase 10 false alarms.

How does status=critical work?

In enterno.io: the monitor flips to DOWN immediately + triggers alert channels. Useful for "script crashed mid-run" — no need to wait for a timeout.

Try the live tool that powered this guide

Free plan — 20 monitors, 5-minute checks, no card required. Upgrade for 1-minute interval and multi-region monitoring.