USENIX Symposium on Internet Technologies and Systems, 1997
Rate of Change and other Metrics: a Live Study of the World Wide Web
Fred Douglis, Anja Feldmann, and Balachander Krishnamurthy
AT&T Labs - Research
Jeffrey Mogul
Digital Equipment Corporation
Abstract
Caching in the World Wide Web is based on two critical
assumptions: that a significant
fraction of requests reaccess resources that have already been
retrieved; and that those resources do
not change between accesses.
We tested the validity of these assumptions, and their
dependence on characteristics of Web resources, including
access rate, age at time of reference, content type, resource
size, and Internet top-level domain. We also measured the rate
at which resources change, and the prevalence of duplicate
copies in the Web.
We quantified the potential benefit of a
shared proxy-caching server in a large environment by using traces
that were collected at the Internet connection points for two large
corporations, representing significant numbers of references.
Only 22% of the resources referenced in the traces we analyzed
were accessed more than once, but about half of the references
were to those multiply-referenced resources. Of this half,
13% were to a resource that had been modified since the previous
traced reference to it.
We found that
the content type and rate of access have a strong influence on these
metrics, the domain has a moderate influence, and size has little
effect.
In addition, we studied other aspects of the rate of change,
including semantic differences such as the insertion or deletion of
anchors, phone numbers, and email addresses.
- View the full text of this paper in
HTML form and
PDF form.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
- To become a USENIX Member, please see our Membership Information.
|