Designing Effective Health Check Endpoints for Web Services
A health check endpoint is a dedicated URL that reports whether your application is functioning correctly. It's the foundation of automated monitoring, load balancer routing, and container orchestration. A well-designed health check prevents false positives, catches real issues, and provides actionable information.
Types of Health Checks
Liveness Check
Answers: "Is the application process running?" A minimal check — if it fails, the application needs to be restarted.
GET /health/live → 200 OK
{"status": "alive"}
Only check the application process itself. Don't check dependencies — a database outage shouldn't trigger an application restart (which won't fix the database).
Readiness Check
Answers: "Can this instance handle traffic?" Checks dependencies and readiness. If it fails, remove this instance from the load balancer but don't restart it.
GET /health/ready → 200 OK
{
"status": "ready",
"checks": {
"database": {"status": "up", "latency_ms": 3},
"redis": {"status": "up", "latency_ms": 1},
"disk_space": {"status": "ok", "free_gb": 42}
}
}
GET /health/ready → 503 Service Unavailable
{
"status": "not_ready",
"checks": {
"database": {"status": "down", "error": "Connection refused"},
"redis": {"status": "up", "latency_ms": 1}
}
}
Startup Check
Answers: "Has the application finished starting up?" Prevents traffic before initialization is complete (database migrations, cache warming, config loading).
What to Check
- Database connectivity: Execute a lightweight query like
SELECT 1 - Cache connectivity: Redis/Memcached
PING - Disk space: Ensure sufficient space for logs, uploads, temp files
- Memory usage: Alert before OOM
- External API документацию reachability: Check critical third-party dependencies (but be careful — their flakiness shouldn't take you down)
- Queue depth: Is the job queue backing up?
- Certificate expiry: Alert days before SSL cert expires
Response Format
{
"status": "healthy",
"version": "2.4.1",
"uptime_seconds": 86432,
"timestamp": "2025-03-15T12:00:00Z",
"checks": {
"mysql": {
"status": "up",
"latency_ms": 2,
"details": "MySQL 8.0, 45 active connections"
},
"redis": {
"status": "up",
"latency_ms": 0.5,
"details": "Redis 6.0, 128MB used"
},
"disk": {
"status": "warning",
"free_gb": 5.2,
"details": "Below 10GB threshold"
}
}
}
HTTP Status Codes
200 OK— All checks pass, service is healthy503 Service Unavailable— One or more critical checks fail429 Too Many Requests— Health check endpoint itself is rate limited
Don't use 500 — that implies an unexpected error. 503 specifically means "temporarily unavailable" which is the right semantics.
Implementation Tips
Keep It Fast
Health checks are called frequently (every 5-30 seconds). They must respond in <1 second. Use connection timeouts of 2-3 seconds for dependency checks.
Don't Break Under Load
A health check that fails under high load will cause the load balancer to remove the instance — reducing capacity exactly when you need it most. Make health checks lightweight and independent of application load.
Cache Dependency Checks
Don't query the database on every health check call. Cache results for 5-10 seconds to reduce overhead.
Separate Public from Internal
Public health check (/health): Returns only status code (200/503). Internal health check (/health/detailed): Returns full diagnostics. Protect the detailed endpoint — it can leak infrastructure information.
PHP Implementation Example
// /api/health.php
$checks = [];
$healthy = true;
// MySQL check
try {
$start = microtime(true);
$db = getDB();
$db->query('SELECT 1');
$checks['mysql'] = [
'status' => 'up',
'latency_ms' => round((microtime(true) - $start) * 1000, 1)
];
} catch (Exception $e) {
$checks['mysql'] = ['status' => 'down', 'error' => $e->getMessage()];
$healthy = false;
}
// Redis check
try {
$start = microtime(true);
$redis = getRedis();
$redis->ping();
$checks['redis'] = [
'status' => 'up',
'latency_ms' => round((microtime(true) - $start) * 1000, 1)
];
} catch (Exception $e) {
$checks['redis'] = ['status' => 'down'];
$healthy = false;
}
http_response_code($healthy ? 200 : 503);
echo json_encode(['status' => $healthy ? 'healthy' : 'unhealthy', 'checks' => $checks]);
Conclusion
Health check endpoints are simple to build but critical to get right. Separate liveness from readiness, keep checks fast, protect detailed endpoints, and ensure they work correctly under load. A well-designed health check is the first thing that tells you something is wrong — make sure it's reliable.
Check your website right now
Check now →