A Cloud Load Balancer (CLB) is a device used in cloud computing to distribute network or application traffic across a server network. A CLB continuously monitors the attributes of network servers to detect server and resource availability and then distribute traffic across the server network based on real-time conditions.
A Cloud Load Balancer can be referred to by different names, such as an Application Delivery Controller (ADC). A content delivery network (CDN) may also refer to their Cloud Load Balancers by other names, e.g., QUANTIL Global Load Balancer (GLB).
Why do we need Cloud Load Balancers?
Many cloud service providers handle thousands, even millions of concurrent online requests; they need to seamlessly respond to incoming traffic while allowing users to access information with high speed and availability. Using a CLB ensures that a server will not be sent an excessive number of requests, which could result in poor resource performance or single and/or multi-server overload.
When delivering content to end users, CDNs use Cloud Load Balancers for a variety of reasons, such as:
- Optimizing resource usage
- Maximizing throughput capacity
- Minimizing response time
A CLB is traditionally used on a large network of servers to direct traffic and disperse it across the server network based on content delivery requirements. Without a CLB, there is very limited traffic management, which can create poor user experiences as well as technical difficulties on the origin server.
How does a Cloud Load Balancer work?
Depending on content delivery requirements, a CLB can be used to manage traffic across a server network based on user location (IP), desired delivery speed, and expected or unexpected traffic peaks, e.g., during product launches or software updates.
The number of servers on a network and their locations can vary between different cloud service providers. For instance, the QUANTIL Global Load Balancer manages network traffic for over 40,000 servers distributed across 700 data center locations. The distribution or concentration of servers across certain regions can be advantageous, depending on content delivery needs. For example:
Content needs to be delivered to a specific region, where high levels of traffic are expected. In this case, a large set of servers concentrated in a specific region, e.g., China, would be more advantageous than having the same number of servers dispersed across a larger region, e.g., Asia. Proximity to end users is the most important factor in this scenario.
Location-based delivery is also useful for businesses that want to extend into new markets, where they don’t want to build and maintain the costly infrastructure required to deliver into that region. In this way, a CDN can help a business to scale in new markets without a large investment. A product or service can be tested in small or large regions, and a Cloud Load Balancer is helpful in managing traffic in cases where expected traffic levels are uncertain.
Global Data Transfer
This type of content delivery is based on a wide dispersion of end user IPs. For example, data or content is located on an origin server in the United States and users are accessing it from Europe and Asia on a regular basis. In this scenario, the dispersion of content across the global server network can be planned in advance and the relevant edge servers are “warmed up” using pre-fetching.
Pre-fetching is when a resource is pushed to a pre-defined set of edge servers in anticipation of its usage by end users. This helps to achieve higher levels of performance, speed, and availability. Cloud Load Balancers are used to ensure that content is always available to end users by directing end user requests to the nearest, most up-to-date version of the content.
Example: Data is transferred between global teams in an organization, where caching locations can be predetermined and edge servers can be warmed up as required. Transfer speed and data availability to end users are the most important factors in this scenario.
Fluctuating/High Traffic Periods
When a product or service is expecting a high level of traffic from multiple locations, e.g., a software update or a new product launch, a Cloud Load Balancer directs traffic to prevent overloading any single resource. In some cases, the content or data can also be pre-fetched to designated locations, which increases the download speed for end users.
In this scenario, high speed is desirable; however, availability is the most important factor. If new product or service is unavailable due to traffic overload, this can have negative effects on the success of a product launch or software update. Therefore, availability is usually prioritized over speed.
This is sometimes viewed as the least customized method of delivery. Incoming requests are distributed sequentially across the server network
How does a Cloud Load Balancer distribute traffic across a server network?
A CLB uses a reverse proxy to handle incoming requests from an end user to one or more servers in a network. The reverse proxy acts as a gateway to the server network and distributes the request load evenly across multiple servers. In order to determine where to redirect each request, a reverse proxy is used to determine the client IP address.
A key function of reverse proxies in load balancing is to provide TCP multiplexing, also known as muxing. This allows active TCP connections to be reused to serve new client requests. For example, when a cached version of static website content is provided to an end user, as shown in the diagram below.
TCP multiplexing reduces the number of resources required on a server, improving response times and server capacity.
Reverse proxies also provide a level of security on a Cloud Load Balancer. Similar to how it manages incoming requests, reverse proxies are used to mitigate DDoS attacks by distributing the incoming malicious requests across a large server network. This leverages network bandwidth to handle malicious traffic and also protects an origin server from direct attacks.
How does a load balancer know where to distribute incoming requests?
In order to understand how a Cloud Load Balancer distributes incoming requests across a network, we first need to understand how a Domain Name System (DNS) works and how it is used.
A DNS is a public directory for domain names, e.g., www.example.com, and their associated IP addresses, e.g, 192.0.2.0/24. The domain name format is mainly used to make it easier for end users to remember or point to the location of a domain as it is easier than remembering long strings of numbers.
When an end user sends a request for a particular domain, the domain name is resolved to its associated IP address(es), i.e., domain name resolution. In some cases, there is only one IP address associated with a domain name, which means the request will be resolved by a single server. However, it is common to map a domain to a network of servers with different IP addresses in order to prevent overloading a single server and/or downtime in the event of server failure. This is where load balancing comes into play.
The load balancer monitors the server network for incoming end user requests. The end user’s location is determined by the DNS server based on their IP address. Using the DNS map, the load balancer selects the most appropriate edge server associated with that domain to serve the content to the end user.
As mentioned before, selecting an appropriate edge server can be based on a number of criteria, depending on the priorities of the Internet Content Provider. For instance, the DNS server can return an IP address for an edge server that is more cost-effective to deliver the content, even if it is not physically closest to the end user. In this way, DNS requests work with reverse proxies to determine the best IP address to return based on user device, location, network conditions, and even the time of day.
Bin Ni is the VP of Engineering of QUANTIL. He and his team is working on all technical aspects of all QUANTIL products.