Ensure your site returns 2xx every minute, alert to Slack/Telegram on failure.
Monitoring cookbook
Hand-written recipes for the monitoring problems we see most often. Each recipe shows a minimal DIY script and the one-click Enterno.io monitor that covers the same concern without extra infrastructure.
Minimal script that checks an SSL certificate and alerts 14 days before expiry.
Detect the moment a replica falls behind the primary by more than 10 seconds.
Your cron silently stopped running. Need an alert when the script misses its window.
Readiness probes pass inside the pod, but no one sees that the LB refused to route traffic to the new deploy.
Redis slave is behind master — read-after-write returns stale data. No native alert, you need an external one.
The server starts returning 503/504 — but a plain uptime check misses it because the homepage is 200 while the API path is on fire.
A container OOM-kills, the restart policy revives it — no external signal until users complain.
A junior marketer flips DMARC from <code>p=quarantine</code> to <code>p=none</code> "to fix bounces" — Gmail starts marking everything as spam an hour later.
One public DNS resolver (1.1.1.1, 8.8.8.8) degrades for a region. Your site "is up" but half the users see "server not found" — the uptime monitor stays silent.
Prometheus + Alertmanager only ship alerts to email or PagerDuty. The team lives in Telegram and you need a bridge without spinning up another service.
long_query_time = 1, slow_query_log enabled. You need to know when the slow-query rate per minute suddenly jumps (a deploy broke an index, ORM went N+1).
The CDN cache_status header (cf-cache-status or x-cache) suddenly returns MISS on more than 30% of requests — origin load + bandwidth bills both spike.
Stripe, GitHub, Twilio return X-RateLimit-Remaining in response headers. If the backend does not track the floor, you get a sudden 429 and billing stops.
Logs or backup files eat /var; in 24 hours the server falls over. A basic df check every 10 minutes saves a 2 AM incident.
Consumer group lags behind the producer and messages pile up. Need a lag threshold that triggers an alert.
Production ES cluster goes yellow status. Need an alert now, not 30 minutes later via Kibana.
Consumer can not keep up; queue grows; disk eventually fills. Need an alert on messages-ready count.
HAProxy balances over 5 backends; one starts erroring and goes DOWN. Alert before users notice.
Backup cron silently failed; nobody noticed; the gap surfaces only at the next incident. Need an alert when the newest backup file is older than 30 hours.
Have a recipe we missed?
Tell us which stack to cover next — drop a line to support@enterno.io and we'll add the recipe (and credit you on the page).
Start monitoring — free →