6+ Main Load Balancing Algorithms

6 min readAug 15, 2024

Load balancing is the process of distributing incoming requests among available servers. Popular load balancing approaches solve the problem of system overload in different ways.

In this article, we’ll examine the working principles, advantages, disadvantages, and optimal use cases for the most well-known algorithms.

Round Robin Algorithm and its Variations — Quantum, Weighted, and Dynamic

Round Robin is a load balancing algorithm that distributes incoming requests among multiple servers in a circular order. The name comes from an old method of signing public appeals and petitions: signatures were intentionally placed in a circle to prevent identifying the initiator.

The principle of the Round Robin algorithm is extremely simple:

Assume we have a list of servers.
The algorithm starts distributing requests from the first server in the list.
Each new request is directed to the next server in the list.
When the end of the list is reached, the algorithm returns to the beginning.

The main advantages of this approach are simplicity of implementation and fair distribution: each server receives an equal number of requests. However, this simplest version of the algorithm will only work effectively in an ideal environment where all servers have almost identical configurations, and all incoming requests (tasks, processes) have the same priority and duration. Therefore, in practice, the Round Robin algorithm is usually equipped with various improvements:

Fixed time intervals (quanta) are allocated for each task. If a task doesn’t complete within its quantum, it’s moved to the end of the queue.
Weighted Round Robin is used, taking into account server capacity. For example, if we have three servers with weights 4, 2, and 1, it means the first server is about twice as powerful as the second and four times as powerful as the third. Instead of evenly distributing requests as in regular Round Robin, WRR distributes requests proportionally to server weights: out of every 7 requests, 4 will go to the first server, 2 to the second, and 1 to the third.
Dynamic Round Robin is applied, which considers the current load on servers when distributing requests. When a request arrives, the algorithm queries current load information from each server (DRR can use various metrics to assess server load — number of active connections, CPU usage, memory usage, server response time, or a combination of several indicators). Based on the received data, a load indicator is calculated for each server, after which the request is directed to the server with the lowest indicator.

Least Connections and its Weighted Variant

Least Connections is a dynamic load balancing algorithm that directs incoming requests to the server with the least number of active connections at a given time. The goal of Least Connections is to evenly distribute requests among servers, preventing overload of individual nodes.

Algorithm working principle:

The load balancer tracks the number of active connections on each server in the pool.
When a new request arrives, the balancer analyzes the current state of all servers.
The request is directed to the server with the least number of active connections.

Advantages:

Adaptability to different server capacities — more powerful servers will naturally handle more requests and thus have more connections. Conversely, less powerful servers will receive fewer requests, preventing their overload.
Flexibility — if one server starts working slower, it will receive fewer new requests.
Improved load distribution compared to unoptimized Round Robin, which distributes requests evenly without considering the current state of servers.

Disadvantages:

The algorithm considers all connections equal, not accounting for some requests being more complex (resource-intensive) than others.
The number of connections doesn’t always accurately reflect the real load on a server. For example, a server with fewer connections might be performing more resource-intensive tasks.
A newly added server may receive a disproportionately large number of requests, as it initially has no active connections.

When to use:

In heterogeneous server infrastructure where servers have different performance capabilities.
When requests have relatively similar complexity and require approximately the same processing time.

Weighted Least Connections

Weighted Least Connections is an improved load balancing method that combines the principles of the Least Connections and Weighted Round Robin algorithms: it directs requests to the server with the lowest ratio of active connections to its assigned weight.

Working principle:

Each server is assigned a weight reflecting its power.
The balancer tracks the number of active connections on each server.
When a new request arrives, the ratio (Active connections) / (Server weight) is calculated.
The request is directed to the server with the lowest value of this ratio.

IP Hash

The IP Hash algorithm uses the client’s IP address (and sometimes the server’s) to determine which server to direct the request to. This ensures session persistence and guarantees that requests from the same client always go to the same server.

Working principle:

When a request arrives, the load balancer extracts the client’s IP address.
A unique key is calculated using any suitable hash function.
The hash function result is divided modulo by the number of available servers.
All subsequent requests from this IP will be directed to the same server.

Advantages of IP Hash:

Session persistence — the algorithm ensures that requests from one client always go to the same server.
Efficiency for static IPs — the method works well in environments where client IPs rarely change.
Facilitates data caching on the server side for specific clients.

Disadvantages:

If many users come from the same IP range, one server may be overloaded.
Ineffective in environments where client IPs frequently change (mobile networks).
Can lead to uneven load if some clients generate more traffic.

Least Response Time

The Least Response Time algorithm balances load by directing requests to servers that demonstrate the best performance at a given moment. It takes into account two key factors: server response time and the number of active connections.

Working principle:

For each server in the system, the load balancer constantly monitors the time it takes for the server to process a request and send a response.
The number of active connections on each server is also tracked.
When a new request arrives, the server with the least response time and the least number of active connections is chosen.
In addition to response time and number of active connections, additional weights may be considered: score = (response_time * weight1) + (active_connections * weight2)

Advantages:

Consideration of current server performance and dynamic adaptation provide optimal balance and the best user experience.
The method works well with servers of different power and applications with different characteristics.

Disadvantages:

Implementation complexity — the algorithm requires constant monitoring and analysis of server performance.
The need to process a larger volume of data for decision-making creates an increased load on the balancer.
Determining and dynamically updating additional weights can become a non-trivial task.

Random Choice and Power of Two Random Choices

Random Choice (Randomized Load Balancing) distributes incoming requests among servers randomly. Each new request is directed to a randomly selected server from the available pool.

Advantages:

Provides even load distribution in the long term.
Simple implementation and low computational costs.
Doesn’t require storing the state of previous requests.

Disadvantages:

Doesn’t consider the current load on servers and their performance, so it’s best suited for systems where servers have roughly equal performance and tasks have more or less equal complexity.

The Power of Two Random Choices algorithm, introduced in 1996 by Michael Mitzenmacher, is an improved version of random choice: instead of choosing one random server, the algorithm chooses two random servers and directs the request to the one with the lower load.

Least Bandwidth

Least Bandwidth is a dynamic load balancing algorithm that directs incoming requests to the server transmitting the least amount of data at the current moment. It’s particularly effective in environments where network bandwidth is a critical performance factor.

Working principle:

The load balancer constantly monitors the bandwidth usage of each server.
When a new request arrives, the server with the lowest current bandwidth usage is selected.
The request is directed to the chosen server.

Advantages:

Optimization of network resource usage — the algorithm effectively prevents overload of individual network channels.
Performance improvement by avoiding delays associated with network congestion.
Adaptation to changes in system bandwidth in real-time.

Disadvantages:

Constant measurement of bandwidth can create additional load on the system.
Doesn’t consider CPU and memory usage.

In conclusion, there’s no universal solution: depending on the specifics of the system, the Least Connections method or IP Hash might be more suitable. Most large platforms use a combination of methods, often employing multi-level balancing: global distribution based on geographical principle at the top level, followed by the application of specialized algorithms at the network and application levels.

Need help implementing or optimizing your load balancing strategy? Contact Gart today for expert guidance and support to ensure your systems are scalable, reliable, and efficient.

6+ Main Load Balancing Algorithms

Round Robin Algorithm and its Variations — Quantum, Weighted, and Dynamic

Least Connections and its Weighted Variant

Weighted Least Connections

IP Hash

Least Response Time

Random Choice and Power of Two Random Choices

Least Bandwidth

Written by Roman Burdiuzha

No responses yet