GLIMPSE: A Tool to Search Through Entire File Systems
Udi Manber
Supported in part by an NSF Presidential Young Investigator Award
(grant DCR-8451397), with matching funds from AT&T, by NSF grants
CCR-9002351 and CCR-9301129, and by the Advanced Research Projects
Agency under contract number DABT63-93-C-0052. Part of this work was
done while the author was visiting the University of Washington.
The information contained in this paper does not necessarily reflect
the position or the policy of the U.S. Government or other sponsors of
this research. No official endorsement should be inferred.
Department of Computer Science
University of Arizona
Tucson, AZ 85721
udi@cs.arizona.edu
Sun Wu
Department of Computer Science
National Chung-Cheng University
Ming-Shong, Chia-Yi, Taiwan
sw@cs.ccu.edu.tw
Abstract
GLIMPSE, which stands for GLobal IMPlicit SEarch, provides indexing
and query schemes for file systems. The novelty of glimpse is that it
uses a very small index - in most cases 2-4% of the size of the text -
and still allows very flexible full-text retrieval including Boolean
queries, approximate matching (i.e., allowing misspelling), and even
searching for regular expressions. In a sense, glimpse extends agrep
to entire file systems, while preserving most of its functionality and
simplicity. Query times are typically slower than with inverted
indexes, but they are still fast enough for many applications. For
example, it took 5 seconds of CPU time to find all 19 occurrences of
Usenix AND Winter in a file system containing 69MB of text spanning
4300 files. Glimpse is particularly designed for personal
information, such as one's own file system. The main characteristic
of personal information is that it is non-uniform and includes many
types of documents. An information retrieval system for personal
information should support many types of queries, flexible
interaction, low overhead, and customization, All these are important
features of glimpse.
Download the full text of this paper in
ASCII (31,878 bytes) form.
To Become a USENIX Member, please see our
Membership Information.