Short answer. An error budget is the amount of unreliability you are allowed after an SLO. If your target is 99.9%, the service may be unavailable 0.1% of the time — that slice is the budget. It is computed as (1 − SLO) × period. The budget turns "reliability" from an argument into a number: while it is unspent the team ships features, and once it runs out releases freeze and reliability work takes over.
Why an error budget matters
Without an error budget, two teams argue forever: developers want to ship faster, operations want stability. The budget gives them a shared language — a number both sides agree on up front.
One hundred percent reliability is not a goal but a trap: it costs infinitely and stalls progress. The error budget says honestly how much failure you can afford.
The calculation
The base formula falls straight out of the SLO:
Error budget (%) = 100% − SLO
Error budget (time) = (1 − SLO) × period length
Example: SLO = 99.9%, period = 30 days
30 days = 43,200 minutes
Budget = (1 − 0.999) × 43200 = 0.001 × 43200 = 43.2 minutes
That means: up to 43.2 minutes of downtime per month is acceptable.
How much budget each SLO grants
| SLO | Budget per month | Budget per week |
|---|---|---|
| 99% | ~7 h 18 min | ~1 h 41 min |
| 99.9% | ~43.2 min | ~10.1 min |
| 99.95% | ~21.9 min | ~5.0 min |
| 99.99% | ~4.4 min | ~1.0 min |
Spending the budget by request
You can measure the budget by requests rather than time — more accurate for API документацию:
Total requests this month = 10,000,000
SLO = 99.9% successful
Allowed errors = (1 − 0.999) × 10,000,000 = 10,000 errors
If 7,000 errors already happened in two weeks:
remaining = 10,000 − 7,000 = 3,000 errors for the rest of the month
The burn rate is high — time to slow risky releases.
The error budget policy
A number without rules is useless. Agree on actions in advance:
- Budget healthy — ship features at normal pace.
- Budget running low (< 25%) — add tests, slow risky changes.
- Budget exhausted — feature freeze, reliability fixes only until recovery.
- Chronic overspend — revisit the SLO or invest seriously in infrastructure.
Burn rate
Burn rate shows how much faster than normal the budget is being consumed. If an hour spends a day's worth of budget, the burn rate is 24 — grounds for an immediate alert. Setting up such alerts is covered in our alerting best practices.
How enterno.io helps track the budget
As external monitoring, enterno.io records every outage episode and accumulates downtime history. On paid plans checks run every minute or every 30 seconds, which reflects real budget spend far better than sparse five-minute probes. Alerts via Telegram, Slack, email, webhook, PagerDuty and Jira warn you when an incident starts eating into the budget.
Incident history and availability are easy to show on a status page, while the monitors themselves provide the data. For cron and массовую проверку URL jobs, heartbeat monitoring comes in handy.
FAQ
How is an error budget different from an SLO?
An SLO is an availability target; the error budget is its mirror image — exactly the share of unreliability the SLO permits. Together they always sum to 100%.
What if the budget is spent mid-month?
Per policy, freeze new features and direct effort to reliability. The budget resets at the start of the next measurement period.
Can I roll over unspent budget?
Usually no: the budget is tied to a window (for example a rolling 30 days) and resets with each new window. That keeps the focus on recent reliability.
How often should I compute the budget?
Continuously over a rolling window. Many teams review spend daily and run burn-rate alerts in real time.
Start measuring budget spend. Spin up monitors at enterno.io/monitors and collect downtime history for an accurate error budget.