Next: Implementation
Up: Alleviating the Latency and
Previous: Cache Manager
We used several heuristics in order to improve the efficiency
of pre-fetching. While many of the heuristics listed below are
simple and intuitively obvious, we found that a combination of
these heuristics provided a remarkable performance improvement
in our daily operation. This section lists the heuristics we used.
- 1.
- Web Sessions: The notion of a Web session had one of the
greatest performance impacts. Without the notion of a session,
the pre-fetch engine re-issued multiple requests for the same
documents if a user accessed the same pages again later in the session.
With the notion of a session, documents are only fetched once
during the session. Subsequent requests are satisfied from the
cache, unless the user explicitly requests a fresh access using
the RELOAD button (which issues a HTTP ``Pragma: no-cache'' header).
At the start of a session, documents with the highest node weights
are hoarded. The idea of Web sessions is not new. Current browsers
all have a notion of a Web session.
- 2.
- Hoard walking: Hoard-walking [14] periodically refreshes
pages with the highest node weight. Since hoard-walking involves pre-fetching
the pages in the user defined hoard file as well as the most heavy nodes
in the learnt database, consequently, a large fraction of typical user
accesses can be satisfied locally during disconnection.
- 3.
- Issue Of Pre-Fetch Requests: Pre-fetches are performed only
when the network is ``idle''. In our system, only four ongoing
pre-fetches are allowed at any time at the local proxy server and
only eight ongoing pre-fetches are allowed at any one time in the
backbone proxy server. Furthermore, on the local proxy server,
pre-fetches are not started until there are no pending user requests.
We observed performance improvements when we amortize pre-fetch requests.
Of course, not issuing pre-fetch requests while there is a pending request
might actually lower performance, especially in the case of requests
with high delays. However, giving priority to user requests eliminates
the harmful effects of pre-fetch tying up bandwidth when it is needed.
- 4.
- Weighting edges: Using weighted edges as opposed to only using node
weights for pre-fetches ensures that the proper usage pattern is
captured. When a user visits a URL, we should choose the edge with the
heaviest weight rather than the adjacent node with the heaviest node
weight. For example, consider the following scenario:
C was visited 2 times from A and 100 times from B. B was visited 10 times
from A. When a user is at A, B should be pre-fetched even though it has
a smaller node weight than C.
- 5.
- Dynamically determining a document's dependents:
We distinguish a document from the images that are linked to it
(which constitute its dependents). Dependents do not appear in the user
profile, because they are accessed automatically upon access of the
document, and also because they change frequently over time. The original
implementation stored the URLs of the dependents. However, due to the
frequent changes in HTML documents, requests were being made for
non-existent documents. This heuristic removed all those redundant
pre-fetch requests, at the same time keeping the user profile graph
small. Furthermore, pre-fetch requests for dependents are issued (at both
the local proxy server and the backbone proxy server) before the browser
issues them.
- 6.
- Continued download of document: Even after the user specifies
``stop'', we continue downloading a document in the local proxy
server. This allows a user to click on another page while the previous
page is being downloaded in the background. It also serves as a crude
form of short-term user-specified hoarding or ``user-driven pre-fetch''. Even
though this is not captured in the performance tests, we found it
extremely useful when we were reading a page with links we knew we wanted
to visit. We would click on the link and quickly press stop. This would issue
the pre-fetch request but keep the browser on the current page. Later,
when we eventually clicked on the page, it came up instantly.
- 7.
- CGI scripts: CGI and other dynamic pages are pre-fetched and
retained in the cache for a short period of time. Currently, CGI pages
are not cached either in browsers or proxies; thus CGI latency is not
hidden. However, with this heuristic, we can pre-fetch CGI pages and
cache them for short periods of time in anticipation of an access. A
CGI page is deleted upon the timeout, or after it is read once.
- 8.
- HTTP Redirections: HTTP response codes 301 and 302 indicate
that a particular URL has moved. If the local proxy system detects
this, it will store the redirection and provide the correct response
to the browser. This prevents the browser from reconnecting to the
server.
- 9.
- Thresholds: Since many documents are only accessed once from a
parent document (e.g. a news item from a topic), we impose a minimum
threshold edge frequency in order to pre-fetch the node. Another
threshold we impose is the size of documents. Large documents (more than
1 megabyte) are not pre-fetched in our current implementation.
However, we anticipate that with the inclusion of size as a weight
parameter, this heuristic will be subsumed in the weighted edge heuristic.
Next: Implementation
Up: Alleviating the Latency and
Previous: Cache Manager
Sau Loon Tong
10/26/1997