Перейти к содержимому
Skip to content
← All articles

Latency vs Throughput: Understanding Network Performance Metrics

What Are Latency and Throughput?

Latency and throughput are two fundamental metrics for measuring network and application performance. While they are related, they measure different aspects of how data moves through a system. Confusing the two — or optimizing one at the expense of the other — is a common mistake that leads to poor user experience and wasted engineering effort.

Latency is the time it takes for a single unit of data (a packet, a request, a message) to travel from source to destination. It is measured in milliseconds (ms). Think of it as the delay before something happens.

Throughput is the amount of data successfully transferred per unit of time. It is measured in requests per second (RPS), megabits per second (Mbps), or transactions per second (TPS). Think of it as capacity — how much work the system can handle.

The Highway Analogy

Imagine a highway connecting two cities. Latency is how long it takes one car to drive from city A to city B. Throughput is how many cars arrive at city B per hour. A highway can have low latency (fast speed limit) but low throughput (one lane). Or high throughput (eight lanes) but high latency (slow speed limit, many traffic lights).

This analogy reveals a key insight: improving one does not automatically improve the other. Adding lanes (bandwidth) does not make individual cars go faster. Raising the speed limit does not increase the number of lanes.

Measuring Latency

Latency has several components that add up to the total round-trip time (RTT):

For web applications, key latency metrics include:

MetricWhat It MeasuresGood Target
TTFB (Time to First Byte)Time from request to first byte of response< 200ms
DNS LookupTime to resolve domain to IP< 50ms
TCP HandshakeTime to establish TCP connection< 50ms
TLS HandshakeTime to negotiate encryption< 100ms
P99 Latency99th percentile response time< 1s

Measuring Throughput

Throughput measurement depends on the context:

Important: bandwidth is not throughput. Bandwidth is the theoretical maximum capacity of a link. Throughput is the actual observed transfer rate, which is always lower due to protocol overhead, congestion, and packet loss.

# Measure network throughput with iperf3
iperf3 -c server.example.com -t 30

# Measure HTTP throughput with wrk
wrk -t12 -c400 -d30s https://example.com/api/health

# Measure throughput with Apache Bench
ab -n 10000 -c 100 https://example.com/api/endpoint

The Relationship Between Latency and Throughput

Latency and throughput are inversely related under load. As throughput approaches the system's maximum capacity, latency increases — often exponentially. This is described by queuing theory, specifically Little's Law:

L = λ × W

Where:
L = average number of items in the system
λ = average arrival rate (throughput)
W = average time an item spends in the system (latency)

This means: as throughput (λ) increases, either the system grows (L increases — more items queued) or latency (W) increases, or both. In practice, as your server approaches maximum RPS, response times spike dramatically.

The Knee Point

Every system has a "knee point" — the throughput level where latency begins to rise sharply. Operating beyond this point leads to cascading failures: queues fill up, timeouts trigger retries, retries add more load, and the system collapses. Identifying and staying below the knee point is critical for capacity planning.

Optimizing Latency

Strategies to reduce latency:

Optimizing Throughput

Strategies to increase throughput:

Monitoring Both Metrics

Effective monitoring tracks both metrics together. A dashboard should show:

Tools like Enterno.io provide real-time latency monitoring for your endpoints, alerting you when response times exceed your thresholds. Combined with throughput tracking, you can detect performance regressions before users notice them.

Common Pitfalls

Key Takeaways

Latency measures delay; throughput measures capacity. Both are essential, and optimizing one can come at the cost of the other. Monitor both metrics with percentiles, identify your system's knee point, and design your architecture to keep latency low even as throughput scales. Use CDNs, caching, and connection reuse for latency; use horizontal scaling, async processing, and connection pooling for throughput.

Check your website right now

Check now →
More articles: Performance
Performance
API Performance Metrics and Optimization
14.03.2026 · 11 views
Performance
Graceful Degradation vs Progressive Enhancement: Strategies and Real-World Examples
16.03.2026 · 15 views
Performance
Website Speed Optimization: A Complete Guide
11.03.2026 · 13 views
Performance
Gzip vs Brotli: Web Compression Compared
16.03.2026 · 56 views