P95 latency is the response time within which 95% of requests complete. Use it instead of average because the mean hides the tail (one 30 s outlier dissolves in a sea of fast requests). Web API target P95 < 500 ms, LLM API < 5 s, DB queries < 100 ms. Compute on a 1-5 min rolling window.
Below: details, example, related terms, FAQ.
Free online tool — HTTP header checker: instant results, no signup.
# Computing P95 in SQL
SELECT
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_ms) AS p95
FROM monitor_checks
WHERE created_at > NOW() - INTERVAL 5 MINUTE;
# In Python (numpy)
import numpy as np
p95 = np.percentile(latencies_ms, 95)
# In an enterno.io monitor: alert when p95_5min > threshold AND error_rate < 1%
# (with many errors, P95 loses meaning — alert on error_rate separately)P95 latency, or the 95th percentile latency, is a performance metric that indicates the maximum response time experienced by 95% of the requests in a given dataset. This measure is crucial for understanding the performance of web applications and services, as it highlights the upper limit of latency that users might experience, thereby allowing organizations to identify potential performance bottlenecks.
Measuring P95 latency involves collecting response time data and calculating the 95th percentile from this data set. Here’s a practical approach to measuring P95 latency using common tools:
SELECT percentile(response_time, 95) FROM requests WHERE time > now() - 1hThis query retrieves the P95 latency over the last hour from your recorded response times.
In addition to using databases, you can also perform calculations directly in code. For example, in Python, you can calculate P95 using the NumPy library:
import numpy as np
response_times = [200, 300, 250, 400, 100, 150, 200]
p95 = np.percentile(response_times, 95)
print(f'P95 Latency: {p95} ms')This snippet calculates the P95 latency from a sample list of response times.
To illustrate how P95 latency can affect user experience, consider a web application that handles API requests. Assume we measure the response times of 1,000 requests and find the following times (in milliseconds):
[100, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000]When calculating the P95 of this dataset:
[100, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000]The P95 latency would be the 950th value, which is 900 ms. This indicates that 95% of users experience latency below 900 ms.
Understanding P95 latency allows teams to set performance benchmarks and identify areas for improvement. If the P95 latency consistently exceeds a predefined threshold (e.g., 500 ms), it may indicate the need for optimization in the application’s architecture, such as improving database queries or optimizing network requests.
In summary, P95 latency is a crucial metric for performance monitoring. By calculating and analyzing this data, organizations can ensure a better user experience and maintain optimal performance levels.
Average hides the tail: 9 requests at 100 ms + 1 request at 10 s = avg 1100 ms, P95 = 10 000 ms. Avg looks "tolerable", P95 shows the real pain.
P95 — operational metric (monitoring). P99 — SLA metric for high-stakes services (payments). For most web APIs P95 is enough.
Minimum 100 points for a stable P95. With 20 points the value jitters. A 5-min window at ≥20 req/min gives a reliable P95.
Free plan — 20 monitors, 5-minute checks, no card required. Upgrade for 1-minute interval and multi-region monitoring.