We will say that a mirror image is inconsistent if out of order updates are applied to the mirror, or the application updates a group of files, and a period ensues during which some of the mirrored copies reflect the updates but others are stale. Inconsistency is a well-known problem when using networks to access file systems, and the issue can be exacerbated when mirroring. For example, suppose that one were to mirror an NFS server, using the standard but unreliable UDP transport protocol. Primary and remote file systems can easily become inconsistent, since UDP packets can be reordered on the wire, particularly if a packet is dropped and the NFS protocol is forced to resend it. Even if a reliable transport protocol is used, in cases where the file system is spread over multiple storage servers, or applications update groups of files, skew in update behavior between the different mirrored servers may be perceived as an inconsistency by applications.
To address this issue, SMFS implements a file system that preserves the order of operations in the structure of the file system itself, a distributed log-structured file system (distributed-LFS)A distributed log-structured file system can expose an NFS interface to hosts; however, it stores data in a distributed log-structured file system instead of a local UNIX file system (UFS)., where a particular log is distributed over multiple disks. Similar to LFS [35,27], it embeds a UNIX tree-structured file system into an append only log format (Figure 4). It breaks a particular log into multiple segments that each have a finite maximum size and are the units of storage allocation and cleaning.
Although log-structured file systems may be unpopular in general settings (due to worries about high cleaning costs if the file system fills up), a log structure turns out to be nearly ideal for file mirroring. First, it is well known that an append-only log-structure is optimized for write performance [27,35]. Second, by combining data and order of operations into one structure -- the log -- identical structures can be managed naturally at remote locations. Finally, log operations can be pipelined, increasing system throughput. Of course, none of this eliminates worries about segment cleaning costs. Our assumption is that because SMFS would be used only for files that need to be mirrored, such as backups and checkpoints, it can be configured with ample capacity--far from the tipping point at which these overheads become problematic.
In Sections 4.1 and 4.2, we describe the storage systems architecture and API.