100 monitoring recipes · uptime, SSL, cron, k8s · Enterno.io · page 4

Anatoly Oshmanovsky

Monitoring cookbook

Hand-written recipes for the monitoring problems we see most often. Each recipe shows a minimal DIY script and the one-click Enterno.io monitor that covers the same concern without extra infrastructure.

100 recipes · MIT-licensed · RU + EN

Kafka — alert when a consumer offset stops advancing

bash

kafka streaming consumer

The consumer is up but offset isn't growing (consumer-thread deadlock, or it's stuck without heartbeat). A lag-only check shows 0 lag because the producer is also idle — but the bug is in production.

Read recipe → HTTP monitor

SQS — alert when the dead-letter queue grows

bash

aws sqs dlq

Main SQS queue is processing fine, but the DLQ silently grows — some messages fail 3 attempts and end up there. Nobody looks at the DLQ until it's a thousand deep.

Read recipe → API monitor

Prometheus — alert when a scrape target is unreachable

bash

prometheus observability scrape

Prometheus itself is alive, but one of its targets has up==0 — data stops flowing, graphs go blank, and alertmanager rules built on that target don't fire (no data = no alert).

Read recipe → HTTP monitor

OTEL collector — alert on dropped spans

bash

opentelemetry observability traces

OTEL collector is overloaded — `otelcol_exporter_send_failed_spans` is climbing. Traces are lost, prod debugging goes blind. The tracing backend hides the gap.

Read recipe → HTTP monitor

Docker daemon — alert when dockerd hangs

bash

docker runtime infra

docker info hangs >30 s — the daemon is in a split-brain state. Containers keep running (kernel holds the namespaces), but you cannot deploy a new release. systemctl status shows active.

Read recipe → Heartbeat monitor

kubelet — alert when a node goes NotReady

bash

kubernetes kubelet node

A node goes NotReady (kubelet stopped pinging the apiserver, runtime is sick) — pods on it linger like zombies until a taint evicts them. Kubernetes events do not go to Slack by default.

Read recipe → HTTP monitor

S3 — alert on a 5xx error-rate climb for a bucket

bash

aws s3 storage

S3 endpoint starts 5xx-ing — your app gets random failures on upload. AWS Health shows 'healthy', the CloudWatch alarm is on a 5-min aggregate — reaction is late.

Read recipe → HTTP monitor

Istio — alert on istio-proxy sidecar restart loop

bash

istio kubernetes mesh

An istio-proxy sidecar in a pod is restarting — the app keeps running, but mesh policy is broken, mTLS goes unchecked, and traffic flows in violation of policy.

Read recipe → HTTP monitor

Envoy — alert when proxy 5xx without upstream 5xx

bash

envoy proxy 5xx

Envoy returns 503 (upstream timeout, no healthy hosts) — users get 5xx, but upstreams themselves are healthy. A standard 5xx-monitor shows "all OK" because it watches the app.

Read recipe → HTTP monitor

logrotate — alert when a log file grows past rotation

bash

logging filesystem disk

logrotate stopped (config syntax error on last edit, or the systemd timer was disabled) — the main log file grows. Nobody notices until the disk fills.

Read recipe → Heartbeat monitor

Borg — alert when a backup failed or is older than N hours

bash

backup borg encryption

A Borg backup fails (passphrase rotated, repo lock stuck, ssh key expired) — you only learn when you need to restore, and the last snapshot is a week old.

Read recipe → Heartbeat monitor

Redis Streams — alert when XPENDING grows

bash

redis streams consumer

A Redis Streams consumer lags — it reads messages but never XACKs them (the worker hangs between read and ack). XLEN does not grow, XPENDING does.

Read recipe → HTTP monitor

Kubernetes — alert when a PVC is stuck Pending

bash

kubernetes storage pvc

PVC is created, but the provisioner did not allocate the volume (wrong StorageClass? capacity exhausted? CSI driver? upstream cloud quota?). The pod waits and never starts — deployment status does not show why.

Read recipe → HTTP monitor

Cron — alert when a scheduled job did not run

bash

cron scheduling heartbeat

cron is alive, but the job (timer disabled, MAILTO=root spam, sh-syntax-error in crontab) did not run last night. The classic "we forgot last night gave empty reports".

Read recipe → Heartbeat monitor

PostgreSQL — alert when autovacuum is stuck

bash

postgres autovacuum bloat

autovacuum_max_workers are pinned (long-running query holds a lock, or vacuum_cost_limit is too low) — tables bloat, disk usage climbs linearly. Postgres itself does not alert.

Read recipe → HTTP monitor

HashiCorp Vault — alert when a service token is about to expire

bash

vault secrets security

A service VAULT_TOKEN is close to expiry (no auto-renewal, or non-renewable=true). The service hits Vault — and one day it gets 403 and loses access to its secrets.

Read recipe → HTTP monitor

sshd — alert on auth-fail spike (before fail2ban bans)

bash

security sshd bruteforce

fail2ban bans sources by threshold — but a campaign hits from thousands of IPs at 1 attempt each. None get banned individually, but overall noise on the ssh port is huge.

Read recipe → Heartbeat monitor

nginx — alert on a proxy_cache hit-ratio drop

bash

nginx cache performance

nginx proxy_cache hit ratio drops — backend starts burning CPU. Usually "forgot proxy_cache_valid in a new location", or cache was wiped, or TTL is too short.

Read recipe → HTTP monitor

Network — alert on packet loss to an upstream (via mtr)

bash

network mtr packet-loss

Connection to a database / partner API loses 5–10 % of packets — the app sees timeouts, but `ping -c 4` says 'all good'. TCP retransmits silently chop throughput.

Read recipe → Ping & Port Checker

BGP — alert when a peer session is down

bash

network bgp routing

A BGP session with an upstream / cloud peering drops — half of routes are gone. The peer does not notify you, and your network monitoring (if any) often is not wired to BGP state.

Read recipe → HTTP monitor

Have a recipe we missed?

Tell us which stack to cover next — drop a line to support@enterno.io and we'll add the recipe (and credit you on the page).

Start monitoring — free →

Have a recipe we missed?

Start monitoring for free