|
USENIX 2nd Symposium on
OS Design and Implementation (OSDI '96)
   
[Technical Program]
Efficient Cooperative Caching using HintsPrasenjit Sarkar and John Hartman
|
Algorithm | N-chance | GMS | Hint-based |
Cache Consistency | Block-based | None | File-based |
Block Location | Manager-based | Manager-based | Hint-based |
Replacement Policy | Random Client | Manager-based LRU | Best-guess LRU |
Duplicate Avoidance | Non-singlets deleted | Non-singlets deleted | Master Copy |
Server Caching | Traditional | Traditional | Discard |
Table 1:N-chance and GMS. This table lists the key features of the N-chance, GMS, and hint-based algorithms.
The N-chance algorithm dynamically partitions the cache of each client between blocks needed by the local client and the cooperative cache. Managers are responsible for maintaining consistency and block location information.
The replacement policy in N-chance uses a combination of local LRU information and duplicate avoidance to decide the best block to replace. Clients always replace the oldest block on their LRU lists. Whether the client forwards or discards the oldest block depends on the number of copies of the block. Blocks with more than one copy are discarded, while blocks with only one copy, or singlets are forwarded to another client at random. If a client does not know whether or not one of its blocks is a singlet, it simply asks the manager.
GMS is more general than N-chance in that it is a distributed shared-memory system, for which cooperative caching is only one possible use. GMS is similar to N-chance in that it uses managers to locate blocks in the client caches. One difference is that it does not provide a consistency mechanism.
The replacement policy used by GMS uses duplicate avoidance to determine whether a block should be discarded or forwarded to the cooperative cache. A local block is forwarded to the cooperative cache if no other copies of the block exist (or in N-chance terminology, the block is a singlet). GMS differs from N-chance in that each copy of a block is always tagged as to whether or not it is a singlet. The manager keeps track of the number of copies of each block and notifies the appropriate client when a block becomes a singlet.
GMS also differs from N-chance in that a centralized algorithm implements the choice of the target client in the replacement policy. The algorithm collects age information of blocks in the client caches from all clients at the end of dynamically controlled time intervals called epochs This manager then distributes the locations of the oldest blocks in the system back to all the clients, which is then used to determine the target client in the replacement policy. The algorithm is so designed that it reasonably approximates the global LRU policy over all the client caches.
The previous cooperative caching algorithms rely in part on exact information, or facts to manage the cache. Although facts allow these algorithms to make optimal decisions, they increase the latency of block accesses and the load on the managers. Our goal in designing a cooperative caching algorithm is to remove the reliance on centralized control of the cooperative cache. Clients should be able to access and replace blocks in the cooperative cache without involving a manager.
Reducing the dependence of clients on managers is achieved through the use of hints information that only approximates the global state of the system. The decisions made by a hint-based system may not be optimal, but managing hints is less expensive than managing facts. Hints do not need to be consistent throughout the system, eliminating the need for centralized coordination of the information flow. As long as the overhead eliminated by not using facts more than offsets the effect of making mistakes, the gamble of using hints will pay off.
The remainder of this section describes the components of a hint-based algorithm. Section 4.1 describes the hint-based block lookup algorithm. Section 4.2 describes how the replacement policy decides whether or not to forward a block to the cooperative cache. Section 4.3 discusses how the replacement policy chooses the target client for forwarding a block. Section 4.4 discusses the use of the server cache to mask replacement mistakes. Finally, Section 4.5 describes the effect of the cache consistency protocol on the use of hints.
When a client suffers a local cache miss a lookup must be performed on the cooperative cache to determine if and where the block is cached. The manager performs this lookup in the previous algorithms, both increasing the block access time and incurring load on the manager and network.
An alternative approach is to let the client itself perform the lookup, using its own hints about the locations of blocks within the cooperative cache. These hints allow the client to access the cooperative cache directly, avoiding the need to contact the manager on every local cache miss.
Based on the above, we can identify two principal functions for a hint-based lookup algorithm:
Each of these functions is discussed in detail in the following sections.
To make sure that hints are reasonably accurate, our strategy is to change hints only when necessary. In other words, correct hints are left untouched and incorrect hints are changed when correct information is made available.
To keep hints as correct as possible, we introduce the concept of a master copy of a block. The first copy of a block to be cached by any client is called the master copy. The master copy of a block is distinct from the block's other copies because the master copy is obtained from the server.
Based on this concept of the master copy, we enumerate two simple rules for hint maintenance:
The hints only contain the probable locations of the master copy and hence, we ignore the changes to the locations of the other copies of the block. This simplifies the task of keeping the hints accurate.
Given hints about the probable location of the master copy of a block, the lookup mechanism must ensure that a block lookup is successful, regardless of whether the hints are right or wrong. Fortunately, as all block writes go through to the server, it always has a valid copy of a block and can satisfy requests for the block should the hints prove false. This simplifies the lookup mechanism which is outlined below:
The general idea is that each client keeps track of the probable location of the master copy of each block, and uses this information to lookup blocks in the cooperative cache.
Simulations show this algorithm works well in a distributed UNIX environment, as described in Section 6. However, the algorithm will not be effective in the following scenario. If several clients share a working set of blocks larger than the cooperative cache, the locations of the master copies will change rapidly as blocks move in and out of the client caches. This will cause the probable master copy locations to be inaccurate, leading to excessive forwarding of requests.
When a block is ejected from the local cache of a client, the cooperative cache replacement policy decides whether or not the block should be forwarded to the cooperative cache. As discussed earlier, one of the motivations of the replacement policy is to ensure that only one copy of a block is stored in the cooperative cache. If not, the cooperative cache will contain unnecessary duplicate copies of the same block.
The previous algorithms rely on the manager to determine whether or not a block should be forwarded to the cooperative cache. A block is forwarded if it is the only copy of the block stored in either the local caches or the cooperative cache. Maintaining this invariant is expensive, however, requiring an N-chance client to contact the manager whenever it wishes to discard a block that is not known to be a singlet, and the GMS manager to contact a client whenever a block becomes a singlet.
To avoid these overheads, we propose a forwarding mechanism in which the copy to be forwarded to the cooperative cache is predetermined and does not require communication between the clients and the manager. In particular, only the master copy of a block is forwarded to the cooperative cache, while all other copies are discarded. Since only master copies are forwarded, and each block has only one master copy, there can be at most one copy of a block in the cooperative cache.
A potential drawback of the master copy algorithm is that it has a different forwarding behavior than the previous algorithms. Instead of forwarding the last local copy of a block as in GMS or N-chance, the master copy algorithm forwards the first or master copy. In some cases, this may lead to unnecessary forwardings. A block which is deleted before it is down to its last copy should not have been forwarded to the cooperative cache. The existing algorithms avoid this, while the master copy algorithm will potentially forward the block. Fortunately, our measurements show that few of the master copy forwardings are unnecessary, as described in Section 6.3.
Once the replacement policy has decided to forward a block to the cooperative cache, it must decide the target client of this forwarding. Forwarding a block to this target client will replace a block that is either in the client's local cache or in the cooperative cache. Note that this replaced block can be a master copy, providing a means for removing master copies from the cooperative cache.
Previous algorithms either choose the client at random, or rely on information from the manager to select the target. An alternative based on hints, however, can provide highly accurate replacements without requiring a centralized manager. We refer to this as best-guess replacement because each client chooses a target client that it believes has the system's oldest block. The objective is to approximate global LRU, without requiring a centralized manager or excessive communication between the clients. The challenge is that the block age information is distributed among all the clients, making it expensive to determine the current globally LRU block.
In best-guess replacement, each client maintains an oldest block list that contains what the client believes to be the oldest block on each client along with its age. This list is sorted by age. A block is forwarded to the client that has the oldest block in the oldest block list.
The high accuracy of best-guess replacement comes from exchanging information about the status of each client. When a block is forwarded from one client to another, both clients exchange the age of their current oldest block, allowing each client to update its oldest block list. The exchange of block age information during replacement allows both active clients (clients that are accessing the cooperative cache) and idle clients (clients that are not) to maintain accurate oldest block lists. Active clients have accurate lists because they frequently forward blocks. Idle clients will be the targets of the forwardings, keeping their lists up-to-date as well. Active clients will also tend to have young blocks, preventing other clients from forwarding blocks to them. In contrast, idle clients will tend to accumulate old blocks and therefore be the target of most forwarding requests.
Changes in the behavior of a client may cause the oldest block lists to become temporarily inaccurate. An active client that becomes idle will initially not be forwarded blocks, but its oldest block will age relative to the other blocks in the system. Eventually this block will be the oldest in the oldest block lists on other clients and be used for replacement. On the other hand, an idle client that becomes active will initially have an up-to-date list because of the blocks it was forwarded while idle. This allows it to accurately forward blocks. Other clients may erroneously forward blocks to the newly-active client but once they do, their updated oldest block lists will prevent them from making the same mistake twice.
Although trace-driven simulation has shown this simple algorithm to work well, there are several potential problems, including the effect of replacing a block that is not the globally LRU block and also the problem of overloading a client with simultaneous replacements.
First, since global state information is not maintained, it is possible for a client to replace a block that is not the globally LRU block. However, if the replaced block is close to the globally LRU block, the performance impact of not choosing the globally LRU block is minimal. In addition, the next section discusses a mechanism for masking any deviations from the globally LRU block.
Second, if several clients believe that the same client has the oldest block, they will all forward their blocks to that client, potentially overloading it. Fortunately, it can be shown that it is highly unlikely that the clients using the cooperative cache would forward their blocks to the same target. This is because clients that do forward their blocks to the same target will receive different ages for the oldest block on the target, since each forwarded block replaces a different oldest block. Over time, the clients' oldest block lists will contain different block age information even if they start out identical, reducing the probability of always choosing the same forwarding targets.
One drawback of best-guess replacement is that erroneous replacements will occur. A block may be forwarded to a client that does not have the oldest block; indeed, a block may be forwarded to a client whose oldest block is actually younger than the forwarded block.
To offset these mistakes we introduce the notion of a discard cache one that is used to hold possible replacement mistakes and thus increase the overall cache hit rate of the system. The simple algorithm used to determine whether a block is mistakenly replaced and should be sent to the discard cache is shown in Table 2. As is evident, non-master copies are always discarded because only master copies are accessed in the cooperative cache.
Replacements are considered to be in error when the target client of a replacement decides that the block is too young to be replaced. A client chooses to replace a block on a particular target client because it believes that client contains the oldest block. The target client considers the replacement to be in error if it does not agree with this assessment. The target determines this by comparing the replaced block's age with the ages of the blocks on its oldest block list. If the block is younger than any of the blocks on the list, then the replacement is deemed an error and the block is forwarded to the discard cache. Otherwise, the block is discarded.
The blocks in the discard cache are replaced in global LRU order. Thus the discard cache serves as a buffer to hold potential replacement mistakes. This extends the lifetimes of the blocks and reduces the number of erroneous replacements that result in an expensive disk access.
Type of block | Action |
Non-master copy | Discard |
Old master copy | Discard |
Young master copy | Send to discard cache |
Table 2:Discard Cache Policy. This table lists how the hint-based replacement policy decides which blocks to send to the discard cache. A master copy is old if it is older than all blocks in the oldest block list, else it is considered young. The oldest block list is the per-client list that contains what the client believes to be the oldest block on each client along with its age.
The use of hints for block lookup raises the issue of maintaining cache consistency. One solution is to use block-based consistency, but this would require contacting the manager on every local cache miss to locate an up-to-date copy, making it pointless to use hints for block lookup or replacement. For this reason, we propose the use of a file-based consistency mechanism. Clients must acquire a token from the manager prior to accessing a file. The manager controls the file tokens, revoking them as necessary to ensure consistency. The token includes version numbers for all the file's blocks, allowing copies of the blocks to be validated individually. Once a client has the file's token it may access the the file's blocks without involving the manager, enabling the use of hints to locate and replace blocks in the cooperative cache.
This section describes the methodology used to compare the performance of the N-chance, GMS, hint-based, and ideal algorithms in detail and analyze the design tradeoffs. We describe the simulation environment, the criteria for evaluating the algorithms, and the ideal algorithms against which the other algorithms are compared.
Period | ||||
Trace Parameter | 1 | 2 | 3 | 4 |
Block reads | 276,628 | 2,011,915 | 261,023 | 343,189 |
Unique blocks accessed | 53,349 | 13,108 | 33,063 | 75,273 |
Active clients | 32 | 24 | 38 | 34 |
Table 3:Trace Period Statistics. This table contains statistics the four trace periods. Active clients refers to the number of clients that actually used the cooperative at any point during the period.
Clients | 42 |
Servers | 1 |
Managers | 1 |
Client Cache Size | 16 MB |
Server Cache Size | 128 MB |
Block Size | 8 KB |
Local Memory Access Time | 0.25 ms |
Remote Memory Access Time | 1.25 ms |
Disk Access Time | 15.85 ms |
Write policy | write-through |
Warm-up Block Accesses | 400,000 |
Forwarding Cache Entries | 100 |
Network | ATM |
Message Latency | 0.2 ms |
Table 4:Simulation Parameters. This table describes the environment used to evaluate the various cooperative caching algorithms.
The algorithms were evaluated using trace-driven simulation. The traces of the Sprite distributed file system[Baker91] were used to drive the simulator. These traces cover four two-day periods and record file system accesses by user programs, such as opening and closing files, and seeking on file descriptors. Actual read and write events were not recorded, but can be inferred from file offsets in other records. The traces record application-level behavior and thus are not Sprite-specific. We restricted our use of the traces to the main file server allspice Table 3 shows statistics for each of the trace periods, while Table 4 shows the simulation parameters.
Most of the simulation parameters are derived from the original study on cooperative caching by Dahlin et al.[6], simplifying performance comparisons. The various access times were obtained from previously published measurements. Although these measurements are now a few years old, and thus likely to be slow when compared to state-of-the-art equipment, they were obtained from real systems. Furthermore, we did not believe that the benefit of updating the simulation parameters warranted the increased difficulty in comparing results.
We also assume that there is a single manager handling centralized functions such as cache consistency and block location. This makes it easier to measure the manager load imposed by the different systems, without introducing an algorithm to distribute the load over multiple managers.
The N-chance and GMS simulators used in these simulations were derived from the simulators created by the systems' designers. The N-chance simulator was modified to incorporate additional functionality used in the xfs file system[Anderson95]. In the modified system, a manager preferentially forwards a request to the cooperative cache instead of a server, improving the cooperative cache hit rate and reducing the load on the servers.
The GMS simulator was modified to add a file-based consistency mechanism. The original GMS system did not contain a consistency mechanism, making it difficult to use for cooperative caching. The consistency mechanism we implemented is identical to the one we use in our hint-based algorithm, as described in Section 4.5.
The simulators were validated through both extensive debugging and analysis of their behavior. For all simulators, we analyzed in detail their processing of a small sample of the traces. Also, all simulators produce expected results for the traces if cooperative caching was disabled. The GMS simulator was further validated by running the traces while varying the length of the epochs. As the epochs were lengthened, the average age of the replaced blocks decreased as expected. Finally, the hint-based simulator was validated both by inspection of its processing of a sample of the traces, and by comparing its results with those of the N-chance simulator.
We use the following two metrics to evaluate the performance of the cooperative caching algorithms:
For comparison purposes, the performance of two ideal cooperative caching algorithms is included. These algorithms provide an upper bound on cooperative cache performance, and thus provide an absolute yardstick against which other algorithms may be measured. The ideal algorithms differ in the replacement policy they use. The Optimal algorithm replaces blocks in a fashion that maximizes the hit rate on the cooperative cache, but cannot be implemented because it requires predicting the future. The Global LRU algorithm approximates the Optimal algorithm by replacing the oldest block in the system, but cannot be efficiently implemented because of the overhead required to find and replace the oldest block.
The Optimal replacement always replaces the block whose next access is farthest in the future. It has been shown that this replacement policy is optimal because it minimizes the number of cache misses[Belady66] and therefore has the lowest block access time.
The Optimal replacement algorithm for a cooperative cache differs from that in a virtual memory system in that block location must be considered when determining which block to replace. When a client stores a block to the cooperative cache, there may be several candidate blocks to replace which are never accessed in the future. To reduce the number of block transfers, the Optimal algorithm always replaces blocks stored locally at that client, if possible.
The Global LRU algorithm approximates the Optimal algorithm by collecting information about the LRU block of every client to determine the globally LRU block and then replacing it. In a real implementation, this algorithm would not only be expensive but also inaccurate. The inaccuracy stems from the fact that the LRU block of a client may change while another client is collecting information about the LRU blocks of all clients.
The Global LRU replacement policy is based on the two goals of determining the exact globally LRU block and minimizing the number of copies of a block. When a client needs to forward a block to the cooperative cache, it first checks to see if the block is a duplicate. If so, it is discarded, otherwise it is forwarded to the client storing the globally LRU block and the globally LRU block is discarded.
As a result of its replacement policy, the Global LRU algorithm tries to maximize the local and remote cache hit rates but does not try to minimize the number of block transfers. The algorithm always discards the globally LRU block, usually requiring a block transfer. As a result, an algorithm that approximates Global LRU and attempts to minimize block transfers can have a lower replacement overhead than the Global LRU algorithm.
This section describes the performance of the cooperative caching algorithms when simulated using the Sprite traces. They are compared in terms of average block access time, manager load, lookup messages, and replacement messages. The effectiveness of the discard cache is also measured, as is the sensitivity of the block access time to variations in the simulation parameters.
Figure 1 shows the average block read access time for all the algorithms, broken down by the time spent in accessing the local cache, remote client caches, server cache, and server disk. Write accesses are not included in the figure because all algorithms use write-through caches.
The read access times for the GMS and hint-based algorithm are very close to the ideal algorithms, and they spent similar amounts of time handling hits in the different levels of the storage hierarchy. The performance of the N-chance algorithm is similar to that of the optimal algorithms in all but the second and fourth periods. During these periods, the N-chance algorithm has a higher number of disk accesses and remote client hits, leading to a longer access time. This is caused by more block accesses in these periods than in the other periods. This increase affects N-chance more than the other algorithms because of its random replacement policy. The higher number of blocks accessed increases the probability that a block replaced at random was being used by the client that cached it.
The time breakdown shows that hits in the server cache are nearly non-existent for all algorithms except the hint-based. This indicates that the hint-based algorithm occasionally makes mistakes in replacing a block from the cooperative cache, but the discard cache in the server corrects some of them.
The block access times for the GMS and hint-based algorithms are very close to that of the ideal algorithms, leaving little room for improvement. The performance of the hint-based algorithm is particularly encouraging, given that hints can be occasionally incorrect.
The block location hints for the cooperative cache are highly accurate. For only 0.01\% of the local cache misses (averaged across all periods) is the desired block in the cooperative cache but the hints say it is not. In these cases, the client will erroneously retrieve the block from the server. Conversely, when a hint says a block is in the cooperative cache, it is correct for 99.95\% of all local cache misses. Of these correct hints, 97.56\% point to the actual location of the block while the remaining result in requests being forwarded. The high hint accuracy and the small number of forwardings translate into an average of only 2.03 messages to perform a block lookup. In comparison, both N-chance and GMS require 3 messages per lookup.
The load imposed on the manager is one measure of the overhead and scalability of an algorithm. The less work a centralized manager must do on the client's behalf, the more scalable is the system. Figure 2 shows the manager load categorized by the load's source, and expressed as the number of messages sent and received by the manager. This is a valid measure of the manager load because each message sent and received by the manager represents work performed by the manager.
As can be seen, managing the client cache consistency imposes a very small load on the manager. This does not mean that the choice of consistency algorithm does not affect system performance, only that it does not contribute significantly to manager load. File-based consistency is still important for enabling the use of hints for replacement and lookup.
Replacement and lookup traffic account for nearly all of the manager load for the N-chance and GMS algorithms. The clients must contact the manager each time a block is forwarded and each time a lookup is done, whereas the hint-based algorithm allows the clients to perform these functions themselves. The result is that the manager load is much higher for N-chance and GMS. N-chance has a relatively high manager load during the second and fourth periods because it has a low local cache hit rate in those periods, as described in the previous section. This increases the number of blocks that must be accessed from the cooperative cache or the server, and increases the manager load accordingly.
Figure 2 showed that at about half of the N-chance and GMS manager load is due to managing block replacement. In this section we analyze the overheads associated with block replacement.
Figure 3 depicts the number of replacement messages each algorithm requires to handle a local cache miss. The N-chance and GMS algorithms have three sources of replacement messages: forwarding the block to another client and notifying the manager; notifying the manager when a block is deleted; and exchanging messages between the clients and the manager to determine when a block should be discarded as opposed to forwarded. Except for the actual forwarding of the block to another client, all messages involve the manager, increasing its load. For best-guess replacement, the only message required is the one to forward the master copy of a block to another client. This dramatically reduces the total number of replacement messages required per local cache miss.
As mentioned before, one of the potential drawbacks of the master copy algorithm is that it may unnecessarily forward master copies. Although Figure 3 shows that best-guess replacement outperforms the other algorithms despite this drawback, we also measured the fraction of forwardings that were unnecessary. An average of only 2.11\% were unnecessary across all periods, indicating that they have a negligible effect on performance.
The server memory represents a valuable resource to the system and at 128 MB it constitutes a large fraction of the memory of the system. The hint-based system we propose uses the server memory as a discard cache to mask some of the mistakes made by the best-guess replacement policy. There are other possible uses for the server memory, however, including as a traditional disk cache and as a portion of the cooperative cache. Unfortunately, the default 16 MB client cache size used in the simulations makes it difficult to measure the effectiveness of the discard cache. The cooperative cache is so large and effective that few accesses go to the server. Thus, to measure the effectiveness of the discard cache, we reduced the size of the client caches and server cache to 4 MB and 16 MB, respectively, and ran simulations of the hint-based system with the server memory in the different uses mentioned. Reducing the cache sizes increases the miss rates on the local and cooperative caches, and therefore the load on the server memory.
The results are shown in Table 5 and indicate that when the server memory is used as a traditional disk cache, it has a very low hit rate of 0.35\% because most of the blocks it stores are duplicated in the local and cooperative caches. This results in a block access time of 1.01 ms. If the server memory is instead used as part of the cooperative cache, the hit rate increases by nearly a factor of 5, causing the block access time to drop to 0.81 ms. Using the memory as a discard cache, however, further increases the hit rate to 2.16\% and drops the block access time to 0.74 ms. By masking some of the replacement mistakes, the discard cache provides a 9\% improvement in the block access time over using the memory as part of the cooperative cache.
Server Memory | Hit Ratio | Block Access Time(ms) |
Disk Cache | 0.35% | 1.01 |
Cooperative Cache | 1.65% | 0.81 |
Discard Cache | 2.16% | 0.74 |
Table 5: Server Memory Uses. This table shows how various uses of the server memory affect the block access time of the hint-based system. Server memory is used as either a traditional disk cache, as part of the cooperative cache, or as a discard cache. The results are averaged across all periods of the trace.
The analysis presented in the previous sections was based on a single system configuration, in which the number of clients, client cache size, number of servers, and other parameters were fixed. Although the hint-based algorithm performed well under the chosen configuration, its sensitivity to variations in the environment is also of concern. This section presents the sensitivity of the block access time to two environmental variables: the client cache size and the fraction of the clients that actively use the cooperative cache.
First, Figure 4 shows the block access time as the client cache size is varied from 4 MB to 16 MB. The remaining system parameters are the same as those shown in Table 4. A smaller client cache increases the load on the cooperative cache in two ways: first, it increases the local cache miss rates and therefore accesses to the cooperative cache; and second, it reduces the available size of the cooperative cache. As the figure shows, the block access times for most algorithms does not significantly increase with a 4 MB client cache, although they do slightly diverge from optimal. Even with caches this small the algorithms do a good job of finding and using the available idle memory, producing access times that are close to optimal. The exception is the N-chance algorithm, whose policy of randomly forwarding blocks hurts it when cooperative cache is scarce.
The sensitivity of the block access time to the fraction of clients that are using the cooperative cache is also of interest. Increasing the fraction of clients that use the cooperative cache increases the demand on the cache, and also decreases the cooperative cache size. This combined effect increases the importance of managing the cooperative cache efficiently.
As Figure 5 shows, only the performance of the N-chance algorithm declines as the fraction of clients using the cooperative cache increases. Again, this is due to the random forwarding of blocks to other clients in N-chance. The remaining algorithms all perform close to the optimal.
Cooperative caching for file systems developed from research involving remote memory usage. The idea of remote memory servers in distributed systems was first introduced by Comer and Griffioen in [Comer90]. Felten and Zahorjan proposed the use of idle machines as remote memory servers in [Felten91]. Franklin et al. in [Franklin92] introduced the concept of remote client servers to extend the traditional client-server database architecture. Leff et al. in [Leff91] showed that memory must be dynamically partitioned between local and remote client needs to maximize hit rates.
Our use of hints to perform block lookup is similar to the techniques used to perform page lookup in distributed shared memory systems that support parallel computation. Li and Hudak describe several strategies for managing distributed shared pages[Li89], including a dynamic distributed manager algorithm in which nodes send page requests to the probable owner of the page. If the target node does not have the page, it forwards the request to the node it believes to be the probable owner. Unlike the hint-based algorithm we propose, all nodes keep track of probable owner information for all pages, so that the request eventually reaches the correct owner. Their results show that the probable owner information is quite accurate and the actual number of forwardings is very small. Our hint-based algorithm also differs in that blocks can be forwarded to the cooperative cache, necessitating a distributed replacement policy. The work of Li and Hudak relies on the virtual memory systems of the individual machines to swap pages to disk, rather than forwarding pages to other nodes.
Cooperative caching is also related to multiprocessor caching in shared memory machines[Lenoski90]. However, message costs are greater in distributed file systems than in multiprocessors and have a greater impact on performance. Thus there must be a concerted effort to reduce the number and size of messages required. This is one focus of research on distributed shared memory[Carter91].
The discard cache is similar in purpose to the victim cache proposed by Jouppi[Jouppi90]. A victim cache is a small fully-associative miss cache that is placed between a direct-mapped processor cache and the main memory system. The victim cache is loaded with the victim of a cache miss rather than the missed cache line itself. As a result, cache lines that conflict in the processor cache can both be cached in the victim cache, increasing performance. In essence, the victim cache catches replacement mistakes made by the processor cache because it is direct-mapped. The discard cache, in contrast, catches mistakes made because of incomplete information about block ages.
Cooperative caching is a technique that allows clients to access blocks stored in the memory of other clients. This enables some of the local cache misses to be handled by other clients, offloading the server and improving the performance of the system. However, cooperative caching requires some level of coordination between the clients to maximize the overall system performance. Previous cooperative caching algorithms achieved this coordination by maintaining global information about the system state. This paper shows that allowing clients to make local decisions based on hints performs as well as the previous algorithms, while requiring less overhead. The hint-based algorithm's block access times are as good as those of the previous and ideal algorithms, while reducing manager load by more than a factor of 15, block lookup traffic by nearly a factor of two-thirds, and replacement traffic by more than a factor of 5.
We would like to thank Mike Dahlin for providing the N-chance simulator and the Sprite trace information, and Mike Feeley for clarifications on the GMS system. Wanda Chiu, Matti Hiltunen, Mudita Jain, Dave Lowenthal, Anup Kuzhiyil and Todd Proebsting all provided much-needed comments on early drafts of this paper, as did the anonymous reviewers. Our paper shepherd, Peter Chen, also deserves thanks for his efforts to improve the paper. This work has been funded in part by the Advanced Research Projects Agency of the Department of Defense under contracts DABT63-94-C-0049 and DABT63-95-C-0075, and a grant from Intel Corporation.
[Anderson95] T.E. Anderson, Michael D. Dahlin, Jeanna M. Neefe, David A. Patterson, Drew S. Roselli, and Randolph Y. Wang. Serverless Network File Systems. In Proceedings of the 15th Symposium on Operating System Principles, pages 109--126, December 1995.
[Baker91] Mary G. Baker, John H. Hartman, Michael D. Kupfer, Ken W. Shirriff, and John K. Ousterhout. Measurements of a Distributed File System. In Proceedings of the 13th Symposium on Operating System Principles, pages 198--212, October 1991.
[Belady66] L.A. Belady. A Study of Replacement Algorithms for a Virtual-Storage Computer. IBM Systems Journal, 5(2):78--101, 1966.
[Carter91] J.B. Carter, J.K. Bennett, and W.Zwaenepoel. Implementation and Performance of Munin. In Proceedings of the 13th Symposium on Operating System Principles, pages 152--164, October 1991.
[Comer90] Douglas E. Comer and J.Griffioen. A New Design for Distributed Systems: The Remote Memory Model. In Proceedings of the Summer 1990 Usenix Conference, pages 127--135, June 1990.
[Dahlin94] Michael D. Dahlin, Randolph Y. Wang, Thomas E. Anderson, and David A. Patterson. Cooperative Caching: Using Remote Client Memory to improve File System Performance. In Proceedings of the 1st Symposium on Operating System Design and Implementation, pages 267--280, November 1994.
[Feeley95] Michael J. Feeley, William E. Morgan, Frederic H. Pighin, Anna R. Karlin, and Henry M. Levy. Implementing Global Memory Management in a Workstation Cluster. In Proceedings of the 15th Symposium on Operating System Principles, pages 201--212, December 1995.
[Felten91] Edward W. Felten and J.Zahorjan. Issues in the Implementation of a Remote Memory Paging System. Technical Report 91-03-09, University of Washington, March 1991.
[Franklin92] Michael J. Franklin, Michael J. Carey, and Miron Livny. Global Memory Management in a Client-Server DBMS Architectures. In Proceedings of the 18th VLDB Conference , Pages 596--609, August 1992.
[Howard88] John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M.Satyanarayanan, Robert N. Sidebotham, and Michael J. West. Scale and Performance in a Distributed File System. ACM Transactions of Computer Systems, 6(1):51--81, February 1988.
[Jouppi90] Norman P. Jouppi. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 364--373, May 1990.
[Leff91] Avraham Leff, Philip S. Yu, and Joel L. Wolf. Policies for Efficient Memory Utilization in a Remote Caching Architecture. In Proceedings of the First International Conference on Parallel and Distributed Information Systems, pages 198--207, December 1991.
[Lenoski90] D.Lenoski, J.Laudon, K.Gharachorloo, A.Gupta, and J.Hennessy. The Directory-based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148--159, May 1990.
[Li89] Kai Li and Paul Hudak. Memory Coherence in Shared Virtual Memory Systems. ACM Transactions of Computer Systems, 7(4):321--359, November 1989.
[Nelson93] Michael N. Nelson, Brent B. Welch, and John K. Ousterhout. Caching in the Sprite Network File System. ACM Transactions of Computer Systems, 11(2):228--239, February 1993.
[Sandberg85] R.Sandberg, D.Goldberg, S.Kleiman, D.Walsh, and B.Lyon. Design and Implementation of the Sun Network File System. In Proceedings of the Summer 1985 Usenix Conference, pages 119--130, June 1985.
This paper was originally published in the
proceedings of The Second Symposium on Operating Systems Design and Implementation (OSDI '96), October 2831, 1996,
Seattle, Washington, USA
Last changed: 10 Jan 2003 aw |
|