Load Balancing Algorithms: Round Robin, Least Connections, and More
Load balancing distributes incoming network traffic across multiple servers to ensure no single server bears too much load. The algorithm used to make distribution decisions has a direct impact on application performance, reliability, and resource utilization. Choosing the right algorithm depends on your traffic patterns, server capabilities, and application architecture.
Why Load Balancing Matters
Without load balancing, a single server handles all requests. This creates a single point of failure and limits your ability to scale. Load balancers solve this by routing requests to a pool of backend servers based on a distribution algorithm. Benefits include:
- High availability — if one server fails, traffic is redirected to healthy ones
- Scalability — add more servers to handle increased load
- Performance — distribute work to reduce response times
- Maintenance — take servers offline for updates without downtime
Static Algorithms
Static algorithms make routing decisions without considering the current state of backend servers. They are simple, predictable, and have minimal overhead.
Round Robin
The simplest algorithm. Requests are distributed sequentially across the server pool. Server 1 gets the first request, Server 2 gets the second, and so on. After the last server, it cycles back to Server 1.
# Nginx round robin (default behavior)
upstream backend {
server 10.0.1.1:8080;
server 10.0.1.2:8080;
server 10.0.1.3:8080;
}
Best for: Servers with identical hardware and stateless applications where each request takes roughly the same processing time.
Limitations: Does not account for differences in server capacity or current load. A slow request on one server can cascade into uneven distribution.
Weighted Round Robin
An extension of round robin where each server is assigned a weight proportional to its capacity. A server with weight 5 receives five times more requests than a server with weight 1.
# Nginx weighted round robin
upstream backend {
server 10.0.1.1:8080 weight=5; # Powerful server
server 10.0.1.2:8080 weight=3; # Medium server
server 10.0.1.3:8080 weight=1; # Small server
}
Best for: Heterogeneous server pools where machines have different CPU, memory, or network capabilities.
IP Hash
A hash of the client's IP address determines which server receives the request. The same client IP always maps to the same server (as long as the server pool does not change).
# Nginx IP hash
upstream backend {
ip_hash;
server 10.0.1.1:8080;
server 10.0.1.2:8080;
server 10.0.1.3:8080;
}
Best for: Applications that require session affinity (sticky sessions) without using cookies or tokens. Useful when server-side session state cannot be externalized.
Limitations: Clients behind a NAT or corporate proxy share the same IP, causing uneven distribution. Adding or removing servers reshuffles the mapping.
Dynamic Algorithms
Dynamic algorithms consider the current state of backend servers when making routing decisions. They adapt to real-time conditions but require more overhead to track server state.
Least Connections
Routes each new request to the server with the fewest active connections. This naturally adapts to differences in request processing time — servers handling slow requests accumulate connections and receive fewer new ones.
# Nginx least connections
upstream backend {
least_conn;
server 10.0.1.1:8080;
server 10.0.1.2:8080;
server 10.0.1.3:8080;
}
Best for: Applications with variable request processing times, such as API документацию where some endpoints are fast and others involve database queries or external service calls.
Weighted Least Connections
Combines least connections with server weights. The routing decision considers both the number of active connections and each server's weight, producing a normalized connection-to-capacity ratio.
# HAProxy weighted least connections
backend app_servers
balance leastconn
server srv1 10.0.1.1:8080 weight 5 check
server srv2 10.0.1.2:8080 weight 3 check
server srv3 10.0.1.3:8080 weight 1 check
Least Response Time
Routes requests to the server with the fastest response time and fewest active connections. This requires the load balancer to actively measure backend response times.
Best for: Latency-sensitive applications where response time consistency is critical.
Random with Two Choices
Picks two servers at random, then sends the request to the one with fewer connections. This provides near-optimal distribution with very low computational overhead and no shared state — making it ideal for distributed load balancers.
Algorithm Comparison
| Algorithm | Type | Session Affinity | Adapts to Load | Complexity |
|---|---|---|---|---|
| Round Robin | Static | No | No | Very Low |
| Weighted Round Robin | Static | No | No | Low |
| IP Hash | Static | Yes | No | Low |
| Least Connections | Dynamic | No | Yes | Medium |
| Weighted Least Conn | Dynamic | No | Yes | Medium |
| Least Response Time | Dynamic | No | Yes | High |
| Random Two Choices | Dynamic | No | Yes | Low |
Health Checks
Regardless of algorithm, health checks are essential. They ensure traffic is only sent to servers that can handle it:
- Passive checks — the load balancer marks a server as down after observing consecutive failures
- Active checks — the load balancer periodically sends probe requests to each server
# HAProxy active health checks
backend app_servers
balance roundrobin
option httpchk GET /health
http-check expect status 200
server srv1 10.0.1.1:8080 check inter 5s fall 3 rise 2
server srv2 10.0.1.2:8080 check inter 5s fall 3 rise 2
Choosing the Right Algorithm
- Identical servers, uniform requests → Round Robin
- Different server capacities → Weighted Round Robin
- Variable request duration → Least Connections
- Session stickiness needed → IP Hash or cookie-based affinity
- Latency-critical applications → Least Response Time
- Distributed / multi-region → Random with Two Choices
Summary
Load balancing algorithms range from simple static approaches like round robin to sophisticated dynamic methods that adapt to real-time server conditions. The optimal choice depends on your specific requirements: server homogeneity, request patterns, session requirements, and latency sensitivity. In practice, starting with least connections is a solid default for most web applications, as it naturally adapts to variable workloads without requiring manual weight tuning.
Check your website right now
Check now →