Alerting Best Practices for Website Monitoring
The Alert Fatigue Problem
Alert fatigue is the number one enemy of effective monitoring. When a team receives hundreds of notifications per day, important alerts get lost in the noise. Research shows that up to 70% of alerts are ignored in teams with poorly configured monitoring.
The goal of alerting is to notify the right person about the right problem at the right time. Nothing more, nothing less.
Principles of Effective Alerting
Every Alert Should Require Action
Each alert should imply a specific action. If an alert doesn't require immediate response, it's not an alert — it's an informational notification. Move it to a dashboard or log.
Ask yourself: "What should I do when I receive this alert?" If there's no answer, delete the alert.
Avoid Duplication
One incident = one alert. If the database is down, you shouldn't receive 50 alerts from all dependent services. Configure dependencies and suppress cascading alerts.
Prioritize
Not all problems are equally urgent:
- Critical — site is down, data loss, security breach. Immediate response, phone call.
- Warning — performance degradation, disk space running low. Response within an hour.
- Info — planned maintenance, certificate expiring soon. Response during business hours.
Setting Thresholds
Static Thresholds
The simplest approach: "if response time exceeds 2 seconds, alert." Works for stable metrics but adapts poorly to changes.
Recommendations:
- Set thresholds based on historical data, not intuition
- Use two levels: warning (85% of critical) and critical
- Review thresholds quarterly
Dynamic Thresholds
Thresholds are calculated automatically from historical data. If typical response time is 100ms and it's currently 300ms, that's an anomaly even if 300ms seems acceptable in absolute terms.
Suppressing Brief Spikes
Don't alert on single threshold violations. Use conditions like "metric exceeds threshold for more than 3 consecutive minutes" or "5 out of 10 last checks failed." This filters out brief network glitches.
Notification Channels
Choose Channel by Severity
- SMS/phone call — only for critical alerts requiring immediate response
- Messengers (Telegram, Slack) — for warning alerts requiring response within an hour
- Email — for informational notifications not requiring urgent response
- Dashboard — for metrics that are important to see but don't need active notification
Don't Duplicate Channels
Sending the same alert via SMS, Telegram, email, and Slack simultaneously is a surefire way to get all channels ignored. One priority level = one channel.
Escalation
Set up escalation chains for critical alerts:
- 0 min — notify on-call engineer (Telegram)
- 15 min — if unacknowledged, SMS to on-call
- 30 min — if unacknowledged, call team lead
- 1 hour — notify the entire team
Without escalation, a critical alert may go unnoticed if the on-call engineer is asleep or unreachable.
Grouping and Correlation
Link alerts to a single root cause:
- If a server doesn't respond to Ping, don't send separate alerts for HTTP, DNS, and SSL
- Group repeated alerts — 100 identical notifications in 10 minutes become one with a "recurring" note
- Add context: not "HTTP 500" but "HTTP 500 on /API документацию/checkout, 15 errors in 5 minutes"
Alert Content
A good alert contains:
- What happened — "Response time for /api/checkout exceeded 3 seconds"
- When — "14:32 UTC, ongoing for 5 minutes"
- Scope — "Affecting 15% of requests"
- Context — "Last deployment was 30 minutes ago"
- Link — link to dashboard or runbook
Regular Alert Audits
Conduct a quarterly review:
- Which alerts fired most frequently? Can you fix the root cause?
- Which alerts never fired? Is the threshold too high or does the problem not exist?
- Which incidents were not caught by alerts? Do you need new checks?
- How many alerts were false positives? Do thresholds need adjustment?
Practical Setup with Enterno.io
Set up uptime monitoring with Enterno.io for your key pages. Use the monitors dashboard to track availability across all services. Start with 2-3 critical checks and gradually expand coverage.
Summary
Effective alerting is a balance between coverage and noise. Every alert should require action, have the right priority, and use the appropriate delivery channel. Regularly review thresholds and remove useless alerts. Five precise alerts are better than five hundred noisy ones.
Check your website right now
Check now →