Architecting Applications on Hadoop
Grand Ballroom C
During the first half of the tutorial, we will provide an intro to Apache Hadoop and the ecosystem. In the second half, we will show, using an end-to-end application of clickstream analytics, how users can:
- Model data in Hadoop, select optimal storage formats for data stored in Hadoop
- Move data between Hadoop and external systems such as relational databases and logs
- Access and process data in Hadoop
- Orchestrate and scheduling workflows on Hadoop
Throughout the example, best practices and considerations for architecting applications on Hadoop will be covered.
Students should bring laptops with a copy of the of the Cloudera Quickstart VM (or access to a working alternate VM or Hadoop cluster). The VM can be downloaded from here.
Requirements are:
These are a 64-bit VMs. They requires a 64-bit host OS and a virtualization product that can support a 64-bit guest OS.
To use a VMware VM, you must use a player compatible with WorkStation 8.x or higher: Player 4.x or higher, ESXi 5.x or higher, or Fusion 4.x or higher. Older versions of WorkStation can be used to create a new VM using the same virtual disk (VMDK file), but some features in VMware Tools won't be available.
The VM and file size vary according to the CDH version as follows:
CDH and Cloudera Manager Version | RAM Required by VM | File Size |
CDH 5 and Cloudera Manager 5 | 4 GB | 3 GB |
CDH 4, Cloudera Impala, Cloudera Search, and Cloudera Manager 4 | 4 GB | 2 GB |
connect with us