Content-aware request distribution is a technique employed in cluster-based network servers, where the request distribution strategy takes into account the service/content requested when deciding which back-end node should serve a given request. In contrast, the purely load-based schemes like weighted round-robin (WRR) used in most commercial high performance cluster servers [11,21] distribute incoming requests in a round-robin fashion, weighted by some measure of load on the different back-end nodes.
The potential advantages of content-aware request distribution are: (1) increased performance due to improved hit rates in the back-end's main memory caches, (2) increased secondary storage scalability due to the ability to partition the server's database over the different back-end nodes, and (3) the ability to employ back-end nodes that are specialized for certain types of requests (e.g., audio and video). Locality-aware request distribution (LARD) is a specific strategy for content-aware request distribution that focuses on the first of the advantages cited above, namely improved cache hit rates in the back-ends [6,26]. LARD improves cluster performance by simultaneously achieving load balancing and high cache hit rates at the back-ends.
In order to inspect the content of the requests, a TCP connection must be established with the client prior to assigning the request to a back-end node. This is because, with content-aware request distribution, the nature and target1 of the client's request influences the assignment. Therefore, a mechanism is required that allows a chosen back-end node to serve a request on a TCP connection that was established elsewhere in the cluster. For reasons of performance, security and interoperability, it is desirable that this mechanism be transparent to the client. We discuss such mechanisms in the next subsection.