SLA Monitoring for SaaS: Measuring and Holding Uptime

Anatoly Oshmanovsky

Мониторинг

SLA Monitoring for SaaS: Measuring and Holding Uptime

Published: 18.06.2026 · ~4art.read_time_min · 4 views

SLA Monitoring for SaaS: Measuring and Holding Uptime

Short answer. A SaaS SLA is a public availability promise expressed as an uptime percentage over a period. 99.9% sounds solid, yet it allows almost 9 hours of downtime per year. To hold an SLA you must measure real availability with independent, short-interval monitoring, compute uptime with a formula, and record incidents automatically. Below: the nines table, the formula and a practical setup.

What an SLA is made of

An SLA (Service Level Agreement) fixes a target availability level and penalties for breaching it — usually credits or partial refunds. Key elements: the target uptime percentage, the measurement window (month/quarter), what counts as downtime and who measures it. If only the provider measures, the customer has no leverage. That is why SaaS teams run independent monitoring.

The nines table: how much downtime is allowed

SLA	Downtime/month	Downtime/year
99% ("two nines")	~7.2 hours	~3.65 days
99.9% ("three nines")	~43.8 minutes	~8.76 hours
99.95%	~21.9 minutes	~4.38 hours
99.99% ("four nines")	~4.38 minutes	~52.6 minutes
99.999% ("five nines")	~26 seconds	~5.26 minutes

More nines means costlier infrastructure and a shorter monitoring interval — otherwise you simply won't catch a short incident.

The uptime formula

Uptime is the share of time the service was available out of the total period:

uptime_% = (total_time - downtime) / total_time * 100

# Example: month = 30 days = 43,200 minutes
# 25 minutes of downtime recorded
uptime_% = (43200 - 25) / 43200 * 100 = 99.942%
# This meets a 99.9% SLA but breaches 99.95%

For the formula to reflect reality, the downtime source must be recorded incidents from independent monitoring, not a rough estimate.

The health-check endpoint

Reliable SLA monitoring relies on a dedicated health endpoint that checks not only that the web server is alive, but that dependencies (DB, cache, queue) are reachable. Example check:

# Hit the health endpoint and measure response time
curl -s -o /dev/null -w "HTTP %{http_code}, %{time_total}s\n" \
  https://api.example.com/health

# Expected output for a healthy service:
# HTTP 200, 0.142s

In enterno.io you add this URL as an HTTP monitor with an expected 200 code and a 1-minute interval — and uptime is computed automatically.

Holding the SLA in practice

Short interval. 99.99% needs a 30-second interval — a 5-minute one will miss a 4-minute incident.
Multi-region. Probes from Russia, the EU and the US separate a network glitch from a real failure.
Incident threshold. Don't open an incident on a single failed check — require several consecutive failures to filter noise.
SSL control. An expired cert is 100% downtime for users, so 14/3-day thresholds are mandatory.

Reporting and transparency

A public status page and incident history turn an SLA from a promise into a verifiable fact. Customers see real uptime, and your team gets an accurate basis for SLA-credit math.

FAQ

How does 99.9% differ from 99.99% in practice?

99.9% allows ~8.76 hours of downtime per year; 99.99% only ~52.6 minutes. The infrastructure and cost gap is large.

What interval does a 99.99% SLA need?

30 seconds. At a longer interval short incidents simply won't be measured and uptime will be overstated.

Should I measure uptime myself or trust the provider?

Independently. Provider-only measurement leaves you without leverage in an SLA-credit dispute.

What counts as downtime?

Any time the service is unavailable to a user: 5xx errors, timeouts, DNS unreachability, an expired SSL.

Set up a health monitor and compute SLA automatically on the uptime monitoring page. Also: SLA and uptime math, monitoring guide, status page best practices and the online website checker.

Check your website right now

Check now →

SLA Monitoring for SaaS: Measuring and Holding Uptime

SLA Monitoring for SaaS: Measuring and Holding Uptime

What an SLA is made of

The nines table: how much downtime is allowed

The uptime formula

The health-check endpoint

Holding the SLA in practice

Reporting and transparency

FAQ

How does 99.9% differ from 99.99% in practice?

What interval does a 99.99% SLA need?

Should I measure uptime myself or trust the provider?

What counts as downtime?

Start monitoring for free