USENIX 2nd Symposium on
OS Design and Implementation (OSDI '96)
Lightweight Logging for Lazy Release
Consistent Distributed Shared Memory
Manuel Costa,
Paulo Guedes,
Manuel Sequeira,
Nuno Neves,
Miguel Castro
IST - INESC
Abstract
This paper presents a new logging and recovery algorithm
for lazy release consistent distributed shared memory (DSM). The
new algorithm tolerates single node failures by maintaining a
distributed log of data dependencies in the volatile memory of
processes.
The algorithm adds very little overhead to the memory
consistency protocol: it sends no additional messages during failure-free
periods; it adds only a minimal amount of data to one of the DSM
protocol messages; it introduces no forced rollbacks of nonfaulty
processes; and it performs no communication-induced accesses to
stable storage. Furthermore, the algorithm logs only a very small
amount of data, because it uses the log of memory accesses already
maintained by the memory consistency protocol.
The algorithm was implemented in TreadMarks, a state-of-the-art
DSM system. Experimental results show that the algorithm has near
zero time overhead and very low space overhead during failure-free
execution, thus refuting the common belief that logging overhead
is necessarily high in recoverable DSM systems.
|