We will charge each algorithm for the documents they evict instead of the documents they bring in. The difference between the two cost measures is at most an additive constant.
Each request for a document happens in a series of steps. First the optimal algorithm serves the request. Then each of the steps of GreedyDual-Size is executed in a separate step. Each step of each request happens at a different point in time. It is straightforward to show by induction on time that for every document q in the cache
Let Lfinal be the last value of L. Let smin denote the size of the smallest document. Let scache be the total size of the cache. Note that k =scache / smin . We will first prove that the total cost of all documents which OPT evicts is at least smin Lfinal. Then we will show that the total cost of all documents evicted by GreedyDual-Size is at most scache Lfinal.
Every time L increases, there is some document which GreedyDual-Size has in the cache which the optimal does not have in the cache. This is because L only increases when GreedyDual-Size has exceeded the size of its cache and must evict a document. Since the optimal algorithm has already satisfied the request, it has the requested document in the cache. Since the newly requested document does not fit in GreedyDual-Size's cache, GreedyDual-Size must have some document in the cache which the optimal does not have in the cache.
Consider a period of time in which GreedyDual-Size has p in its cache and the optimal does not. Such a period begins with the optimal evicting p from the cache and ends when either we evict p from the cache or the optimal brings p back in to the cache. We will attribute any increase in L which occurs during this period to the cost the optimal incurred in evicting p at the beginning of the period. The cost of evicting p is c(p). The only thing we have to prove in establishing that the optimal cost is at least smin Lfinal is that L increases by at most c(p)/s(p) < c(p)/smin during the period.
Let L1 be the value of L when the period begins. We know that at this time, H(p) < L1 + c(p)/s(p). Furthermore, H(p) does not change during this period. This is because H(p) only increases when p is requested. p can only be requested on the last request of the period (because the period is defined to the period of time in which GreedyDual-Size has p in its cache and the optimal does not). If the last request of the period is to document p, then the optimal brings p into its cache, and hence the period ends before H(p) increases. If the period ends by p's eviction, H(p) remains the same until p is evicted. Since H(p) is an upper bound for L, L can not increase to more than L1 + c(p)/s(p) during the entire period.
Now we must establish that the total cost of all documents evicted by GreedyDual-Size is at most scache Lfinal. Consider an eviction that GreedyDual-Size performs. Let p be the document that is evicted and let L1 and L2 be the values for L when it is brought in and evicted from the cache, respectively. The value of H(p) when p is brought in to the cache is L1 + c(p)/s(p). p can only be evicted if L equals H(p). Since H(p) can only increase during the time that p is in the cache, we know that L2 - L1 > c(p)/s(p) .
Draw an interval on the real line from L1 to L2. that is closed on the left end and open on the right end. Assign the interval a weight of s(p). If we draw an interval for every such eviction, the cost of GreedyDual-Size is upper bounded by the sum over all intervals of their length times their weight. By definition, all intervals lie in the range from 0 to Lfinal.
The final observation is that the total weight of all the intervals which contain any single point on the real line is at most scache. Consider a point L' on the line where an interval begins or ends. The total weight of the intervals that cover this point is the sum of the sizes of the documents which are in the cache when L reaches a value of L'. Since the size of the cache is at most smin Lfinal, the sum of the weights of the intervals which cover L' is at most scache.