8:00 a.m.–9:00 a.m. |
Tuesday |
Continental Breakfast
Columbus Foyer
|
9:00 a.m.–10:00 a.m. |
Tuesday |
Michael Franklin, Thomas M. Siebel Professor of Computer Science, University of California, Berkeley
|
10:00 a.m.–10:30 a.m. |
Tuesday |
Break with Refreshments
Columbus Foyer
|
10:30 a.m.–12:10 p.m. |
Tuesday |
Session Chair: Curt Kolovson, VMware
Mingqiang Li, Chuan Qin, Patrick P. C. Lee, The Chinese University of Hong Kong; Jin Li, Guangzhou University Cloud-of-clouds storage exploits diversity of cloud storage vendors to provide fault tolerance and avoid vendor lock-ins. Its inherent diversity property also enables us to offer keyless data security via dispersal algorithms. However, the keyless security of existing dispersal algorithms relies on the embedded random information, which breaks data deduplication of the dispersed data. To simultaneously enable keyless security and deduplication, we propose a novel dispersal approach called convergent dispersal, which replaces original random information with deterministic cryptographic hash information that is derived from the original data but cannot be inferred by attackers without knowing the whole data. We develop two convergent dispersal algorithms, namely CRSSS and CAONT-RS. Our evaluation shows that CRSSS and CAONT-RS provide complementary performance advantages for different parameter settings.
Xiaosong Ma, Qatar Computing Research Institute Personal data are important assets that people nowadays entrust with cloud storage services for the convenience of easy, ubiquitous access. To attract/retain customers, cloud storage companies aggressively replicate and georeplicate data. Such replication may be over-cautious for the majority of data objects and contributes to the relatively high price of cloud storage. Yet cloud storage companies are reluctant to provide costumers with any guarantee against permanent data loss.
In this paper, we discuss the viability for cloud storage service to provide optional data insurance. We examine major risks associated with cloud storage data loss and derive a crude model for premium calculation. The estimated premium level (per unit declared value) in most scenarios is found significantly smaller than that accepted in mature businesses like shipping. Therefore, optional insurance can potentially provide cloud storage services with more flexibility and cost-effectiveness in resource management, and customers with both peace of mind and lowered cost.
Helgi Sigurbjarnarson, Petur Orri Ragnarsson, Ymir Vigfusson, Reykjavik University; Mahesh Balakrishnan, Microsoft Research Modern applications expand to fill the space available to them, exploiting local storage to improve performance by caching, prefetching and precomputing data. In virtualized settings, this behavior compromises storage elasticity owing to a rigid contract between the hypervisor and the guest OS: once space is allocated to a virtual disk and used by an application, it cannot be reclaimed by the hypervisor. In this paper, we propose a new guest filesystem called Harmonium that exploits the ephemeral or derivative nature of application data. Each file in Harmonium optionally has a motif that describes how the file can be reconstructed via computation, network accesses, or operations on other files. Harmonium expands files from their motifs when space is available, and contracts them back to their motifs when it is scarce. Given a target size, the system selects files to expand or contract based on the load on the CPU, network, and storage, as well as expected access patterns. As a result, Harmonium enables elastic cloud storage, allowing the hypervisor to dynamically balance storage across multiple VMs.
Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda, and Zheng Zhang, Microsoft Research For many big data analytics workloads, approximate results suffice. This begs the question, whether and how the underlying system architecture can take advantage of such relaxations, thereby lifting constraints inherent in today’s architectures. This position paper explores one of the possible directions. Impression Store is a distributed storage system with the abstraction of big data vectors. It aggregates updates internally and responds to the retrieval of top-K high-value entries. With proper extension, Impression Store supports various aggregations, top-K queries, outlier and major mode detection. While restricted in scope, such queries represent a substantial and important portion of many production workloads. In return, the system has unparalleled scalability; any node in the system can process any query, both reads and updates. The key technique we leverage is compressive sensing, a technique that substantially reduces the amount of active memory state, IO, and traffic volume needed to achieve such scalability.
|
12:10 p.m.–2:00 p.m. |
Tuesday |
FCW '14 Luncheon
Grand Ballroom ABC
|
2:00 p.m.–3:30 p.m. |
Tuesday |
Session Chair: Nitin Agrawal, NEC
Hao Luo, Lei Tian and Hong Jiang, University of Nebraska, Lincoln The persistent storage options in smartphones employ journaling or double-write to enforce atomicity, consistency and durability, which introduces significant overhead to system performance. Our in-depth examination of the issue leads us to believe that much of the overhead would be unnecessary if we rethink the volatility of memory considering the battery-backed characteristics of DRAM in modern-day smartphones. With this rethinking, we propose quasi Non-Volatile Memory (qNVRAM), a new design that makes the DRAM in smartphones quasi non-volatile, to help remove the performance overhead of enforcing persistency. We assess the feasibility and effectiveness of our design by implementing a persistent page cache in SQLite. Our evaluation on a real Android smartphone shows that qNVRAM speeds up the insert, update and delete transactions by up to 16:33x, 15:86x and 15:76x respectively.
Weiping He and David H.C. Du, University of Minnesota, Twin Cities Shingled Write Disks (SWDs) increase the storage density by writing data in overlapping tracks. Consequently, data cannot be updated freely in place without overwriting the valid data in subsequent tracks if any. A write operation therefore may incur several extra read and write operations, which creates a write amplification problem. In this paper, we propose several novel static Logical Block Address (LBA) to Physical Block Address (PBA) mapping schemes for in-place update SWDs which significantly reduce the write amplification. The experiments with four traces demonstrate that our scheme can provide comparable performance to that of regular Hard Disk Drives (HDDs) when the SWD space usage is no more than 75%.
Zhichao Li, Amanpreet Mukker, and Erez Zadok, Stony Brook University Modern storage systems are becoming more complex, combining different storage technologies with different behaviors. Performance alone is not enough to characterize storage systems: energy efficiency, durability, and more are becoming equally important. We posit that one must evaluate storage systems from a monetary cost perspective as well as performance. We believe that cost should consider the workloads used over the storage systems’ expected lifetime. We designed and developed a versatile hybrid storage system under Linux that combines HDD and SSD. The SSD can be used as cache or as primary storage for hot data. Our system includes tunable parameters to enable trading off performance, energy use, and durability. We built a cost model and evaluated our system under a variety of workloads and parameters, to illustrate the importance of cost evaluations of storage systems.
|
3:30 p.m.–4:00 p.m. |
Tuesday |
Break with Refreshments
Columbus Foyer
|
4:00 p.m.–5:15 p.m. |
Tuesday |
Session Chair: Margo Seltzer, Harvard School of Engineering and Applied Sciences and Oracle
Simon Peter, Jialin Li, Doug Woos, Irene Zhang, Dan R. K. Ports, Thomas Anderson, Arvind Krishnamurthy, and Mark Zbikowski, University of Washington We propose a radical re-architecture of the traditional operating system storage stack to move the kernel off the data path. Leveraging virtualized I/O hardware for disk and flash storage, most read and write I/O operations go directly to application code. The kernel dynamically allocates extents, manages the virtual to physical binding, and performs name translation. The benefit is to dramatically reduce the CPU overhead of storage operations while improving application flexibility.
Leonardo Mármol, Florida International University; Swaminathan Sundararaman and Nisha Talagala, FusionIO; Raju Rangaswami, Florida International University; Sushma Devendrappa, Bharath Ramsundar, and Sriram Ganesan, FusionIO State-of-the-art flash-optimized KV stores frequently rely upon a log structure and/or compaction-based strategy to optimally organize content on flash. However, these strategies lead to excessive I/O, beyond the write amplification generated within the flash itself, with both the application and the flash device constantly rearranging data. In this paper, we explore the other extreme in the design space: minimal data management at the KV store and heavy reliance on the Flash Translation Layer (FTL) capabilities. NVMKV is a scalable and lightweight KV store that leverages advanced capabilities that are becoming available in modern FTLs. We demonstrate that NVMKV (i) performs KV operations at close to native device access speeds for get operations, (ii) outperforms state of the art KV stores by 50%-300%, (iii) significantly improves performance predictability for the YCSB KV benchmark when compared with the popular LevelDB KV store, and (iv) reduces data written to flash by as much as 1.7X and 29X for sequential and random write workloads relative to LevelDB, thereby dramatically increasing device lifetime.
Rini T. Kaushik, IBM Research—Almaden High performance storage layer is vital for allowing interactive ad hoc SQL analytics (OLAP style) over Big Data. The paper makes a case for leveraging flash in the Big Data stack to speed up queries. State-of-the-art Big Data layouts and algorithms are optimized for hard disks (i.e., sequential access is emphasized over random access) and result in suboptimal performance on flash given its drastically different performance characteristics. While existing columnar and row-columnar layouts are able to reduce disk IO compared to row-based layouts, they still end up reading significant columnar data irrelevant to the query as they only employ coarse-grained, intra-columnar data skipping which doesn’t work across all queries. FlashQueryFile’s specialized columnar data layouts, selection, and projection algorithms fully exploit fast random accesses and high internal I/O parallelism of flash to allow fast and I/O-efficient query processing and fine-grained, intra-columnar data skipping to minimize data read per query. FlashQueryFile results in 11X-100X TPC-H query speedup and 38%-99.08% reduction in data read compared to flash-based HDD-optimized row-columnar data layout and its associated algorithms.
|
6:00 p.m.–7:00 p.m. |
Tuesday |
Tuesday Happy Hour
Columbus Foyer
|