Securing data from large-scale disasters is important, especially for critical enterprises such as major banks, brokerages, and other service providers. Data loss can be catastrophic for any company -- Gartner estimates that 40% of enterprises that experience a disaster (e.g. loss of a site) go out of business within five years [41]. Data loss failure in a large bank can have much greater consequences with potentially global implications.
Accordingly, many organizations are looking at dedicated high-speed optical links as a disaster tolerance option: they hope to continuously mirror vital data at remote locations, ensuring safety from geographically localized failures such as those caused by natural disasters or other calamities. However, taking advantage of this new capability in the wide-area has been a challenge; existing mirroring solutions are highly latency sensitive [19]. As a result, many critical enterprises operate at risk of catastrophic data loss [22].
The central trade-off involves balancing safety against performance. So-called synchronous mirroring solutions [6,12] block applications until data is safely mirrored at the remote location: the primary site waits for an acknowledgment from the remote site before allowing the application to continue executing. These are very safe, but extremely sensitive to link latency. Semi-synchronous mirroring solutions [12,42] allow the application to continue executing once data has been written to a local disk; the updates are transmitted as soon as possible, but data can still be lost if disaster strikes. The end of the spectrum is fully asynchronous: not only does the application resume as soon as the data is written locally, but updates are also batched and may be transmitted periodically, for instance every thirty minutes [6,12,19,31]. These solutions perform best, but have the weakest safety guarantees.
Today, most enterprises primarily use asynchronous or semi-synchronous remote mirroring solutions over the wide-area, despite the significant risks posed by such a stance. Their applications simply cannot tolerate the performance degradation of synchronous solutions [22]. The US Treasury Department and the Finance Sector Technology Consortium have identified the creation of new options as a top priority for the community [30].
In this paper, we explore a new mirroring option called network-sync, which potentially offers stronger guarantees on data reliability than semi-synchronous and asynchronous solutions while retaining their performance. It is designed around two principles. First, it proactively adds redundancy at the network level to transmitted data. Second, it exposes the level of in-network redundancy added for any sent data via feedback notifications. Proactive redundancy allows for reliable transmission with latency and jitter independent of the length of the link, a property critical for long-distance mirroring. Feedback makes it possible for a file system (or other applications) to respond to clients as soon as enough recovery data has been transmitted to ensure that the desired safety level has been reached. Figure 1 illustrates this idea.
Of course, data can still be lost; network-sync is not as safe as a synchronous solution. If the primary site fails and the wide-area network simultaneously partitions, data will still be lost. Such scenarios are uncommon, however. Network-sync offers the developer a valuable new option for trading data reliability against performance.
Although this paper focuses on the Smoke and Mirrors File System (SMFS), we believe that many kinds of applications could benefit from a network-sync option. These include other kinds of storage systems where remote mirroring is performed by a disk array (e.g. [12]), a storage area network (e.g. [19]), or a more traditional file server (e.g. [31]). Network-sync might also be valuable in transactional databases that stream update logs from a primary site to a backup, or to other kinds of fault-tolerant services.
Beyond its use of the network-sync option, SMFS has a second interesting property. Many applications update files in groups, and in such cases, if even one of the files in a group is out of date, the whole group may be useless (Seneca [19] calls this atomic, in-order asynchronous batched commits; SnapMirror [31] offers a similar capability). SMFS addresses the need in two ways. First, if an application updates multiple files in a short period of time, the updates will reach the remote site with minimal temporal skew. Second, SMFS maintains group-mirroring consistency, in which files in the same file system can be updated as a group in a single operation where the group of updates will all be reflected by the remote mirror site atomically, either all or none.
In summary, our paper makes the following contributions:
The rest of this paper is structured as follows. We discuss our fault model in Section 2. In Section 3, we describe the network-sync option. We describe the SMFS protocols that interact with the network-sync option in Section 4. In Section 5, we evaluate the design and implementation. Finally, Section 6 describes related work and Section 7 concludes.