Check out the new USENIX Web site. next up previous
Next: Accessing Remote File and Up: The Design of the Previous: Data Consistency

   
Using Existing Results in New Queries

In addition to dependencies based on the hierarchy, HAC allow users to define arbitrary dependencies by adding directories names to queries. This gives users the power to combine query-language expressions (searching) with edited query results (browsing) in a very powerful way by specifying path names of existing directories (syntactic or semantic) as part of their queries. If the query of a semantic directory new contains the name of another semantic directory old, then we say that new depends on old, or new refers to old. Notice that dependencies are transitive. That is, if old depends on ancient, then new depends on ancient. When the CBA mechanism evaluates such a query, it can use HAC's API to determine which parts of the query contain path names of directories, and which parts contain search expressions. When it encounters the path name of a directory, it can use HAC's API to obtain the existing query-result (set of ``pointers'' to files) stored in that directory. Then, the CBA mechanism can operate on this set of pointers exactly as if it is the result of evaluating a search expression. In this way, users can easily augment the search capabilities of the CBA mechanism with their ability to customize information to their tastes.

One complication that arises here is that path-names can change when users rename directories. In the above example, if old is renamed as old', new's query is inconsistent since it refers to the old path name old. To solve this problem, HAC maintains a global mapping of unique identifiers to directory path-names, and stores only the identifers in actual queries. So, instead of updating the queries of all directories like new that depend on old, HAC simply updates the global map when old is renamed as old'.

When directory names are parts of queries, scope consistency becomes harder and trickier. For example, we must re-evaluate new's query whenever the scope provided by old changes, even if new is not in the subtree rooted at old. We start with the definition of the dependency graph - it is the directed graph of dependencies between semantic directories. We do not allow cycles to exist in this graph for obvious reasons. Updating the query-results cannot be done in an arbitrary order. We must use the order obtained from a topological sort of the dependency graph. There is always a valid topological sort since this is a Directed Acyclic Graph (DAG). The root of a HAC file system always occurs first in this order since all directories depend on it and the root does not depend on any other directory (the root does not have a query associated with it). Also note that there is no need for HAC to explicitly restrict the query result of a child directory to the scope provided by its parent. If users want this behavior, all they need to do is modify the query of the child directory to be: ``<old query> AND <path-name of parent>" -- HAC takes care of everything else. In fact, underneath the covers, this is exactly how we implement parent-child dependencies in the strict hierarchical scope consistency algorithm above. HAC gives users the freedom to chose whether they want strict hierarchical dependencies, DAG based dependencies, or both, interleaved arbitrarily.

Since we allow explicit scope consistency definitions along with implicit hierarchical scope consistency, one may argue that there is no need for the latter; we could leave it up to the user to specify the parent of a directory in its query. However, we feel that query-refinement based on path names is important for two reasons. First, tree-based classification of information is intuitive, and is sufficient for many real world scenarios. And second, most users may find tree-based classification simpler to understand because they do not have to worry about two structures -- one based on path-names and one based on the dependency graph -- at the same time.


next up previous
Next: Accessing Remote File and Up: The Design of the Previous: Data Consistency
Burra Gopal
1999-01-04