Vitter, Krishnan and Curewitz were the first to examine the use of compression modeling techniques to track events and prefetch items [19]. They prove that such techniques converge to an optimal online algorithm. They go on to test this work for memory access patterns [4] in an object oriented database and a CAD system. They deal with the large model size by paging portions of the model to secondary memory, and show that this can be done with negligible effect on performance. Additionally they suggest that such methods could have great success within a variety of other applications such as hyper-text. Our work adapts PPM in a different manner. We avoid the use of vine pointers [2,10] and instead keep an array of the current contexts. Their method of selection for prefetching (choosing the n most probable items, where n is a parameter of their method) differs from the threshold based method we use. Lastly, the problem domain we examine (file systems access patterns) differs from that which they have worked under (virtual memory access patterns).
Within the domain of file systems, Griffioen and Appleton [6] have worked to developed a predictive model that for each file accumulates frequency counts of all files that are accessed in a window after the first. These frequency counts are then used to drive a prefetching cache. Their prediction model differs from ours in that they look at more than just the next event. Additionally, they only consider a first order model. Nevertheless, the method of prefetch selection based on a probability threshold was first presented in their work.
Kuenning, Popek, and Reiher [12] have done extensive work analyzing the behavior of file system requests for various mobile environments with the intent of developing a prefetching system that would predict needed files and cache them locally. Their work has concluded that such a predictive caching system has promise to be effective for a wide variety of environments. Kuenning has extended on this work [11] developing the concept of a Semantic Distance, and using this to determine groupings of file that should be kept on local disks for mobile computers.
Patterson, et al. [16,15] have modified a compiler and file system to implement a method called Transparent Informed Prefetching where applications inform the file system which files to prefetch. While this method can offers significant improvements in throughput, it is dependent on the applications ability to know its future actions. For example cc would only know which header files it would need once it had read in the line #include stdio.h, while our our predictive model could notice that every accesses to program.c causes an access to stdio.h. Finally, such an application specific method would not be able to make use of any relationships that exist across applications (e.g. between make and cc).
Cao et al. [3] have approached this problem from a unique perspective, by examining what characteristics an off-line prefetching technique would require to be successful.