The Four Golden Signals of Monitoring

Anatoly Oshmanovsky

Monitoring

The Four Golden Signals of Monitoring

Published: 22.06.2026 · ~4 min · 24 views

Short answer. The four golden signals from SRE practice are latency (how fast the service responds), traffic (how many requests arrive), errors (the share of failed responses) and saturation (how full your resources are). If you only have time for a handful of dashboards, start here: together these four catch almost any user-facing failure.

Where the golden signals come from

The concept was popularised by the Google SRE team. The idea is simple: instead of a hundred metrics, focus on four that directly reflect service health from the user's point of view.

If you can only measure four metrics of a user-facing system, measure latency, traffic, errors and saturation.

1. Latency

This is response time. Separate the latency of successful and failed requests: a fast 500 error can mask a problem. Look at percentiles, not the average.

p50 — the typical user.
p95 / p99 — the tail that hurts the experience most.

# PromQL: p99 latency over 5 minutes
histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

2. Traffic

How much demand the service sees — requests per second, transactions, queued messages. Traffic provides context: an error spike during a 10x traffic surge is a different story than one at normal load.

# PromQL: requests per second
sum(rate(http_requests_total[5m]))

3. Errors

The share of failed requests. Count not only 5xx but also "silent" errors: a 200 response with a wrong body is still a failure.

# PromQL: 5xx error ratio
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))

4. Saturation

How loaded your resources are — CPU, memory, disk, connection pool. Saturation predicts future trouble: the service still answers, but a resource is nearly exhausted.

# PromQL: CPU utilisation in percent
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Signal summary table

Signal	What it measures	Example metric
Latency	Response time	p99 request duration
Traffic	Volume of demand	Requests per second
Errors	Share of failures	% of 5xx responses
Saturation	Resource load	% CPU, memory

External and internal signals

Saturation and part of latency are measured from the inside — on the servers. But latency, traffic and errors from the user's point of view are best seen from the outside, via synthetic monitoring.

Internal metrics (Prometheus) capture saturation and root causes.
External checks capture the real user experience.
Together they give a complete picture of an incident.

What enterno.io covers

As external (synthetic) monitoring, enterno.io measures response latency, errors (HTTP/SSL status codes) and availability from vantage points around the world. This complements internal Prometheus with the user-side view. HTTP, SSL, Ping and DNS checks run every minute or every 30 seconds on paid plans, multi-region from Russia, Europe and the US. Alerts arrive via Telegram, Slack, email, webhook, PagerDuty and Jira.

Spin up monitors for the external signals, show availability on a status page, and use heartbeat for queues and cron. For the response side, see our incident response plan.

FAQ

Can I start with just the golden signals?

Yes. It is the recommended starting point: four signals cover most user-facing failures, and detailed metrics are added later as needed.

Why is average response time a poor metric?

The average hides the tail: if 1% of users wait 10 seconds, the average can still look fine. The p95/p99 percentiles reveal the real pain.

How do golden signals differ from the USE method?

USE (Utilization, Saturation, Errors) focuses on resources, while golden signals focus on the user-facing service. They are often used together: USE for infrastructure, golden signals for applications.

Do I need Prometheus to apply golden signals?

No. The signals are a concept, not a tool. Latency, traffic and errors can be collected by external synthetic monitoring without your own metrics stack.

Cover the external golden signals. Create checks at enterno.io/monitors and measure latency, errors and availability through the user's eyes.

Check your website right now

Check your site →

The Four Golden Signals of Monitoring

Where the golden signals come from

1. Latency

2. Traffic

3. Errors

4. Saturation

Signal summary table

External and internal signals

What enterno.io covers

FAQ

Can I start with just the golden signals?

Why is average response time a poor metric?

How do golden signals differ from the USE method?

Do I need Prometheus to apply golden signals?

Start monitoring for free