Introduction to Apache Hadoop and Its Ecosystem

Half Day Morning

(9:00 am-12:30 pm)

Ballroom A

Description:

Originally inspired by Google's GFS and MapReduce papers, Apache Hadoop is an open source framework offering scalable, distributed, fault-tolerant data storage and processing on standard hardware. This session explains what Hadoop is and where it best fits into the modern data center. You'll learn the basics of how it offers scalable data storage and processing, some important "ecosystem" tools that complement Hadoop's capabilities, and several practical ways organizations are using these tools today. Additionally, you'll learn about the basic architecture of a Hadoop cluster and some recent developments that will further improve Hadoop's scalability and performance.

Who should attend:

This session is intended for those who are new to Hadoop and are seeking to understand what Hadoop is, the ways that organizations are using it, and how it compares to and integrates with other systems. It assumes no prior knowledge of Hadoop, and explanations of technical topics like MapReduce and HDFS replication are clear and concise, making it appropriate for anyone attending the conference.

Topics include: