The Internet, and particularly the World Wide Web (WWW), consists of an ever-increasing number of servers, networks, and personal machines with dramatically varying qualities of service. Individuals often access the WWW via modems with bandwidth of 14.4-28.8 Kbps, while their provider might have a T1 link or better to the rest of the Internet. But then the actual access may be to another site with low bandwidth, high server load, or both. Thus the latency to respond to the user's request for a page is unpredictable: often fairly low, but sometimes extremely high. The unpredictability and generally slow response may be exacerbated in environments with even poorer quality of service, such as cellular telephony or wide-area wireless networks.
A number of techniques have been implemented or proposed to deal with HTTP latency. A browser can direct its request to a proxy-caching server (henceforth referred to as a server proxy) on the other end of the low-speed connection, and then the latency for retrieving pages elsewhere in the Internet can be eliminated when someone else has retrieved those pages in the recent past. (Recency is a function of the size of the cache, any expiration dates in the pages, and any constraints passed from the browser to the cache [5, 12]. Also, some pages are flagged as uncacheable, and the proxy-caching server is obliged to pass those requests through to the content provider. Caching is discussed further in Section 2.) America Online (AOL) uses a proprietary protocol between the browser on a user's machine and an enormous cache of WWW pages within the AOL server cluster. Prefetching pages during periods when the modem would otherwise be idle can reduce or eliminate the latency of following a link, if the prefetch is accurate and the user thinks between clicks long enough for prefetching to complete [17, 21]. Padmanabhan and Mogul's study of persistent HTTP connections [16] indicates that a persistent connection between a client and proxy, or between one of them and a content provider, will eliminate TCP connection setup and slow-start overhead.
We look at the problem of latency from another perspective: using computation to improve end-to-end network latency. The idea of trading off computation for I/O bandwidth has appeared numerous times in past systems. Examples include application-specific deltas and compression, such as Low-bandwidth X[14]; compressed network or disk I/O [3, 6, 7]; replicated file systems [2]; shared memory [15]; and checkpointing [9, 18]. It seems that the same tradeoffs apply in the domain of the WWW.
We address the issue of latency from the perspective of sending the differences between versions of a page, or deltas, in order to avoid sending entire pages. Briefly, deltas are used in two ways. First, if both ends of the slow link store the same version of a page, and the server proxy obtains a new copy of the page, it can send a delta to the user's machine. This hopefully will reduce the transfer time on the slow link. Second, if the user's machine does not store the older version, the server proxy can send a potentially obsolete version of the page immediately and request the new version in parallel. It follows this with a delta against the obsolete version if necessary. Thus, the idle time on the slow link when the server proxy is waiting for a response from the end server is not wasted.
The size of the delta may turn out to be large, in which case the server proxy may have to abort sending the stale version (if one is being sent) and just send the current page received. In this case, work for sending the stale data and calculating the delta is wasted. Worse, as the server proxy does not pipeline the data from the content provider to the client but instead waits till the whole page has been received (since it needs to compute a delta), the end latency as perceived by the client may be somewhat larger. Our approach optimistically assumes that this case is uncommon; hence we refer to the case where stale data is transferred as an optimistic delta. The experiments described in this paper support this assumption. In contrast, we refer to the case where both sides share a cached version as a simple delta.
Our work has some similarity to Dingle and Partl [5], who recently proposed that a hierarchy of proxy-caching servers could be used to send stale data as a MIME multipart document, causing the browser to display the stale data immediately and to replace it with more recent versions as they become available. The main difference with our work is that Dingle and Partl do not address the latency of obtaining the final version of the data. Our approach attempts to reduce this latency by sending the the current version as a delta. In addition, we do not display the stale page: it is intercepted by a client proxy [19], typically co-located with the browser on the same machine, and passed to the browser once it is updated or known to be current. In the case where the ``stale'' page is actually current, our system behaves like the proposal in [5]. Finally, intercepting the stale data by a client-side proxy permits the transmission of the stale data to be aborted transparently once it becomes clear that simply sending the current version is more efficient (e.g., if the delta turns out to be too large relative to the page, or if the current page became available very quickly).
The rest of this paper is organized as follows. Section 2 provides some background into caching in the HTTP protocol and an analysis of HTTP latency. Section 3 discusses some data analysis to support our hypotheses that delta sizes will be small and that there will be sufficient idle time to transfer stale copies. Next, Section 4 describes the design of our system, and Section 5 covers experimental results. Finally, Section 6 discusses the status of the system and future work, and Section 7 offers some conclusions.