Existing Document Replacement Algorithms

Next: Implementation Concerns Up: Existing Results Previous: Existing Theoretical Results

Existing Document Replacement
Algorithms

We describe nine cache replacement algorithms proposed in recent studies, which attempt to minimize various cost metrics, such as miss ratio, byte miss ratio, average latency, and total cost. Below we give a brief description of all of them. In describing the various algorithms, it is convenient to view each request for a document as being satisfied in the following way: the algorithm brings the newly requested document into the cache and then evicts documents until the capacity of the cache is no longer exceeded. Algorithms are then distinguished by how they choose which documents to evict. This view allows for the possibility that the requested document itself may be evicted upon its arrival into the cache, which means it replaces no other document in the cache.

Least-Recently-Used (LRU) evicts the document which was requested the least recently.
Least-Frequently-Used (LFU) evicts the document which is accessed least frequently.
Size [WASAF96] evicts the largest document.
LRU-Threshold [ASAWF95] is the same as LRU, except documents larger than a certain threshold size are never cached;
Log(Size)+LRU [ASAWF95] evicts the document who has the largest log(size) and is the least recently used document among all documents with the same log(size).
Hyper-G [WASAF96] is a refinement of LFU with last access time and size considerations;
Pitkow/Recker [WASAF96] removes the least-recently-used document, except if all documents are accessed today, in which case the largest one is removed;
Lowest-Latency-First [WA97] tries to minimize average latency by removing the document with the lowest download latency first;
Hybrid, introduced in [WA97], is aimed at reducing the total latency. A function is computed for each document which is designed to capture the utility of retaining a given document in the cache. The document with the smallest function value is then evicted. The function for a document p located at server s depends on the following parameters: , the time to connect with server s, the bandwidth to server s, the number of times p has been requested since it was brought into the cache, and , the size (in bytes) of document p. The function is defined as:

where and are constants. Estimates for and are based on the the times to fetch documents from server s in the recent past.
Lowest Relative Value (LRV), introduced in [LRV97], includes the cost and size of a document in the calculation of a value that estimates the utility of keeping a document in the cache. The algorithm evicts the document with the lowest value. The calculation of the value is based on extensive empirical analysis of trace data. For a given i, let denote the probability that a document is requested i+1 times given that it is requested i times. is estimated in an online manner by taking the ratio , where is the total number of documents seen so far which have been requested at least i times in the trace. is the same as except the value is determined by restricting the count only to pages of size s. Furthermore, let 1 - D(t) be the probability that a page is requested again as a function of the time (in seconds) since its last request t; D(t) is estimated as

Then for a particular document d of size s and cost c, if the last request to d is the i'th request to it, and the last request was made t seconds ago, d's value in LRV is calculated as:

Among all documents, LRV evict the one with the lowest value. Thus, LRV takes into account locality, cost and size of a document.

Existing studies using actual Web proxy traces narrowed down the choice for proxy replacement algorithms to LRU, SIZE, Hybrid and LRV. Results in [WASAF96, ASAWF95] show that SIZE performs better than LFU, LRU-threshold,
Log(size)+LRU, Hyper-G and Pitkow/Recker. Results in [WASAF96] also show that SIZE outperforms LRU in most situations. However, a different study [LRV97] shows that LRU outperforms SIZE in terms of byte hit rate. Comparing LFU and LRU, our experiments show that though LFU can outperform LRU slightly when the cache size is very small, in most cases LFU performs worse than LRU. In terms of minimizing latency, [WA97] show that Hybrid performs better than Lowest-Latency-First. Finally, [LRV97] shows that LRV outperforms both LRU and SIZE in terms of hit ratio and byte hit ratio. One disadvantage of both Hybrid and LRV is their heavy parameterization, which leaves one uncertain about their performance across access streams.

However, the studies offer no conclusion on which algorithm a proxy should use. Essentially, the problem is finding an algorithm that can combine the observed access pattern with the cost and size considerations.

Implementation Concerns

Next: Implementation Concerns Up: Existing Results Previous: Existing Theoretical Results

Pei Cao
Thu Oct 23 18:04:42 CDT 1997