Skip to content

P95 latency

Key idea:

P95 latency is the response time within which 95% of requests complete. Use it instead of average because the mean hides the tail (one 30 s outlier dissolves in a sea of fast requests). Web API target P95 < 500 ms, LLM API < 5 s, DB queries < 100 ms. Compute on a 1-5 min rolling window.

Below: details, example, related terms, FAQ.

Try it now — free →

Details

  • P50 (median) — half the requests are faster, half slower. A good "typical" indicator
  • P95 — the critical signal: the slowest 5% = real user pain
  • P99 — the extreme tail; matters for high-SLA services (payments, auth)
  • Formula: sort N values, take element at index ⌈N × 0.95⌉ - 1
  • Rolling window: per-minute / per-5-min, otherwise spikes dissolve in hourly stats

Example

# Computing P95 in SQL
SELECT
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_ms) AS p95
FROM monitor_checks
WHERE created_at > NOW() - INTERVAL 5 MINUTE;

# In Python (numpy)
import numpy as np
p95 = np.percentile(latencies_ms, 95)

# In an enterno.io monitor: alert when p95_5min > threshold AND error_rate < 1%
# (with many errors, P95 loses meaning — alert on error_rate separately)

Related

What is P95 Latency?

P95 latency, or the 95th percentile latency, is a performance metric that indicates the maximum response time experienced by 95% of the requests in a given dataset. This measure is crucial for understanding the performance of web applications and services, as it highlights the upper limit of latency that users might experience, thereby allowing organizations to identify potential performance bottlenecks.

How to Measure P95 Latency

Measuring P95 latency involves collecting response time data and calculating the 95th percentile from this data set. Here’s a practical approach to measuring P95 latency using common tools:

  1. Collect Data: Utilize monitoring tools like Prometheus or Grafana to gather response times for your application. You can instrument your application to log response times using OpenTelemetry.
  2. Data Storage: Store the collected data in a time-series database (e.g., InfluxDB or TimescaleDB) for easy querying.
  3. Calculate P95: Use a query to calculate the 95th percentile. For example, in InfluxDB, you could use:
SELECT percentile(response_time, 95) FROM requests WHERE time > now() - 1h

This query retrieves the P95 latency over the last hour from your recorded response times.

In addition to using databases, you can also perform calculations directly in code. For example, in Python, you can calculate P95 using the NumPy library:

import numpy as np
response_times = [200, 300, 250, 400, 100, 150, 200]
p95 = np.percentile(response_times, 95)
print(f'P95 Latency: {p95} ms')

This snippet calculates the P95 latency from a sample list of response times.

Practical Example of P95 Latency in Action

To illustrate how P95 latency can affect user experience, consider a web application that handles API requests. Assume we measure the response times of 1,000 requests and find the following times (in milliseconds):

[100, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000]

When calculating the P95 of this dataset:

  1. Sort the response times in ascending order.
  2. Identify the 95th percentile. With 1,000 data points, the 95th percentile corresponds to the 950th value in the sorted list.
  3. For this example, if the sorted times are:
[100, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000]

The P95 latency would be the 950th value, which is 900 ms. This indicates that 95% of users experience latency below 900 ms.

Understanding P95 latency allows teams to set performance benchmarks and identify areas for improvement. If the P95 latency consistently exceeds a predefined threshold (e.g., 500 ms), it may indicate the need for optimization in the application’s architecture, such as improving database queries or optimizing network requests.

In summary, P95 latency is a crucial metric for performance monitoring. By calculating and analyzing this data, organizations can ensure a better user experience and maintain optimal performance levels.

Learn more

Frequently Asked Questions

Why not the average?

Average hides the tail: 9 requests at 100 ms + 1 request at 10 s = avg 1100 ms, P95 = 10 000 ms. Avg looks "tolerable", P95 shows the real pain.

P95 vs P99?

P95 — operational metric (monitoring). P99 — SLA metric for high-stakes services (payments). For most web APIs P95 is enough.

How much data is needed?

Minimum 100 points for a stable P95. With 20 points the value jitters. A 5-min window at ≥20 req/min gives a reliable P95.

Try the live tool that powered this guide

Free plan — 20 monitors, 5-minute checks, no card required. Upgrade for 1-minute interval and multi-region monitoring.