Mambo: Running Analytics on Enterprise Storage
Gokul Soundararajan and Jingxin Feng, NetApp; Xing Lin, University of Utah
Big data is defined broadly as large datasets with unstructured types of formats, and which cannot be processed by traditional database systems. Businesses have turned to big data analytical tools such as Apache Hadoop to help store and analyze these datasets. Apache Hadoop software is a framework that enables the distributed processing of large and varied datasets, across clusters of computers, by using programming models. Hadoop Distributed File System (HDFS) provides high throughput of application data. Hadoop provides integration to enhance specific workloads, storage efficiency, and data management.
Hadoop has been used primarily on incoming, external data; however, there’s been a need to use Hadoop on existing, internal data, typically stored in network-attached storage (NAS). Typically, this requires setting up another storage silo to host the HDFS and then running the Hadoop analytics on that storage. This, in turn, results in additional data management, more inefficiencies, and additional costs of moving the data. In this talk, we will talk about NFS Connector for Hadoop, *an open-source project*, which allows Hadoop to run natively by using NFS and without needing to move the data or create a separate data silo. We will describe the underlying architecture, use cases, and integration with Hadoop.
The work has been published earlier in a FAST '13 publication titled "MixApart: Decoupled Analytics for Shared Storage Systems." This talk will describe the journey from the original research work to its current form. It will also highlight customer use cases that led to its productization. We will use the forum to get additional feedback and to glean additional use cases.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Gokul Soundararajan and Jingxin Feng and Xing Lin},
title = {Mambo: Running Analytics on Enterprise Storage},
year = {2015},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = jul
}
connect with us