TCP Connection Tuning: Keepalive, Window Size, and Nagle's Algorithm
TCP Connection Tuning: A Practical Guide
TCP (Transmission Control Protocol) provides reliable, ordered data delivery across networks, but its default settings are often suboptimal for modern applications. Understanding and tuning TCP parameters can significantly improve throughput, reduce latency, and optimize resource utilization for your specific workloads.
TCP Keepalive Configuration
TCP keepalive is a mechanism that detects dead connections by periodically sending probe packets when a connection is idle. Without keepalive, a connection can remain in an established state indefinitely even after the remote host has gone down, consuming resources unnecessarily.
Keepalive Parameters
| Parameter | Linux Sysctl | Default | Description |
|---|---|---|---|
| Keepalive Time | net.ipv4.tcp_keepalive_time | 7200s | Time of idle before sending first probe |
| Keepalive Interval | net.ipv4.tcp_keepalive_intvl | 75s | Interval between subsequent probes |
| Keepalive Probes | net.ipv4.tcp_keepalive_probes | 9 | Number of failed probes before dropping |
Recommended Keepalive Tuning
# Reduce idle detection time for web servers
# Default: 2 hours idle before first probe
# Tuned: 60 seconds idle, 10-second intervals, 6 probes
# Dead connection detected in ~2 minutes instead of ~2.5 hours
sysctl -w net.ipv4.tcp_keepalive_time=60
sysctl -w net.ipv4.tcp_keepalive_intvl=10
sysctl -w net.ipv4.tcp_keepalive_probes=6
# Make persistent across reboots
echo "net.ipv4.tcp_keepalive_time = 60" >> /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_intvl = 10" >> /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_probes = 6" >> /etc/sysctl.conf
TCP Window Size and Scaling
The TCP receive window determines how much data can be in flight before requiring an acknowledgment. For high-bandwidth, high-latency connections (large BDP), the default window size is often the primary bottleneck.
Bandwidth-Delay Product
The optimal window size equals the bandwidth-delay product (BDP): the bandwidth multiplied by the round-trip time. For a 1 Gbps link with 50ms RTT:
BDP = Bandwidth x RTT
BDP = 1,000,000,000 bits/s x 0.050 s = 50,000,000 bits = 6.25 MB
# The receive window must be at least 6.25 MB to fully utilize the link
Window Scaling Configuration
# Enable TCP window scaling (allows windows larger than 64KB)
sysctl -w net.ipv4.tcp_window_scaling=1
# Set maximum buffer sizes
sysctl -w net.core.rmem_max=16777216 # 16 MB max receive buffer
sysctl -w net.core.wmem_max=16777216 # 16 MB max send buffer
# Set TCP buffer auto-tuning range (min, default, max)
sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"
Buffer Size Guidelines
- Web servers — moderate buffers (256KB-1MB) sufficient for typical HTTP request/response sizes
- File transfer servers — large buffers (4MB-16MB) to maximize throughput on high-BDP paths
- Database connections — moderate buffers with emphasis on low-latency tuning
- Real-time applications — smaller buffers to minimize bufferbloat and reduce latency
Nagle's Algorithm
Nagle's algorithm reduces the number of small packets sent over the network by buffering small writes and combining them into larger segments. While beneficial for reducing overhead on bulk transfers, it can introduce latency for interactive or real-time applications.
How Nagle's Algorithm Works
- When an application writes data smaller than MSS (Maximum Segment Size), TCP buffers it
- The buffered data is sent only when: a previous segment is acknowledged, OR enough data accumulates to fill an MSS
- This reduces the number of small packets but adds up to one RTT of latency per write
Disabling Nagle's Algorithm
// C/C++ — disable Nagle's per socket
int flag = 1;
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));
// Python — disable Nagle's per socket
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
// PHP — disable Nagle's (if sockets extension available)
socket_set_option($sock, SOL_TCP, TCP_NODELAY, 1);
When to Disable Nagle's
- Interactive protocols — SSH, telnet, gaming where keystroke latency matters
- HTTP/2 multiplexing — small frames should not be delayed
- Real-time messaging — chat applications, WebSocket connections
- RPC frameworks — request-response patterns where small messages are common
Congestion Control Algorithms
TCP congestion control determines how aggressively TCP increases its sending rate. Modern algorithms provide significant improvements over the traditional Reno and Cubic algorithms.
Comparing Algorithms
| Algorithm | Best For | Key Characteristic |
|---|---|---|
| Cubic | General purpose | Default on most Linux systems, loss-based |
| BBR | High-BDP paths | Model-based, estimates bandwidth and RTT independently |
| BBR2 | Fairness-sensitive | Improved fairness with Cubic flows |
| DCTCP | Data centers | Uses ECN marks for fine-grained control |
# Check available algorithms
sysctl net.ipv4.tcp_available_congestion_control
# Set BBR as default (requires kernel 4.9+)
sysctl -w net.ipv4.tcp_congestion_control=bbr
sysctl -w net.core.default_qdisc=fq
Connection Optimization Checklist
- Enable TCP Fast Open for reduced connection setup time on repeat connections
- Tune keepalive parameters to detect dead connections quickly without excessive probing
- Calculate BDP for your primary network paths and size buffers accordingly
- Disable Nagle's algorithm for latency-sensitive applications
- Consider BBR congestion control for high-latency or lossy network paths
- Enable selective acknowledgments (SACK) for efficient loss recovery
- Set appropriate socket timeouts to prevent indefinite blocking
- Monitor TCP retransmission rates and connection states for anomalies
Conclusion
TCP tuning is a nuanced discipline that requires understanding your specific workload characteristics and network conditions. There is no universal optimal configuration — the right settings depend on whether you prioritize throughput, latency, or resource efficiency. Start with measuring your current TCP performance, identify bottlenecks using the bandwidth-delay product, and apply targeted tuning while monitoring the results.
Check your website right now
Check now →