API Performance Metrics and Optimization
Why Measure API Performance
API документацию are the backbone of modern web applications. A slow API means a slow website, poor user experience, and lost customers. According to Amazon, every 100ms of latency reduces sales by 1%. For APIs serving frontends, this is critical.
Without metrics, you don't know how fast your API performs, where bottlenecks are, or when degradation begins.
Key Metrics
Latency
Time from sending a request to receiving a response. The primary user experience metric.
How to measure:
- Average — misleading. Can hide problems: if 99 requests take 100ms and 1 takes 10 seconds, the average is 199ms
- Median (P50) — 50% of requests are faster. Represents typical experience
- P95/P99 — 95%/99% of requests are faster. Shows the tail — worst-case experience
- P99.9 — for high-traffic APIs, critical for SLA
Target values:
- P50: under 100ms for internal APIs, under 200ms for public
- P95: under 500ms
- P99: under 1 second
Throughput
Number of requests processed per unit of time (RPS — Requests Per Second). Shows system capacity.
Monitor throughput alongside latency. Increasing RPS often causes latency to rise — knowing the degradation point is crucial.
Error Rate
Percentage of requests that end in error (HTTP 4xx, 5xx). Target: less than 0.1% for 5xx errors.
Distinguish between:
- 4xx — client errors (validation, authorization). Not always a server problem
- 5xx — server errors. Always require attention
- Timeout — request received no response. Often the most damaging error
Availability
Percentage of time the API responds correctly. SLA typically defines target uptime:
- 99.9% — no more than 8.7 hours downtime per year
- 99.99% — no more than 52.5 minutes per year
Saturation
How loaded resources are: CPU, memory, DB connections, disk I/O. When saturation approaches 100%, latency spikes dramatically.
The RED Methodology
RED (Rate, Errors, Duration) is a simple methodology for monitoring microservices:
- Rate — requests per second
- Errors — failed requests per second
- Duration — response time distribution (histogram)
These three metrics cover 80% of API monitoring needs.
API Performance Optimization
Database Queries
- Use indexes for frequent queries
- Avoid the N+1 problem (1 query for list + N queries for details)
- Cache frequently repeated query results in Redis
- Use pagination instead of loading all records
Caching
- Redis/Memcached for hot data
- HTTP caching (
Cache-Control,ETag) for public endpoints - In-request computation memoization
Compression
Enable gzip/brotli for JSON responses. For large JSON responses, compression can reduce size by 80-90%.
Pagination and Filtering
Don't return 10,000 records in a single response. Use cursor-based or offset-based pagination. Let clients filter data server-side.
Monitoring and Tools
Use the Enterno.io HTTP Checker to verify response times and headers of your API endpoints. Set up uptime monitoring for key endpoints to receive notifications on degradation.
Summary
Monitor latency (percentiles, not averages), throughput, error rate, and saturation. Use the RED methodology for microservices. Optimize database queries, cache hot data, compress responses. Set SLAs and monitor compliance.
Check your website right now
Check now →