DEV 301 - Developing Hadoop Applications

Learn how to write Hadoop Applications using MapReduce and YARN in Java

Processing...
Processing...

About this Course

This course teaches developers how to write Hadoop Applications using MapReduce and YARN in Java. The course covers debugging, managing jobs, improving performance, working with custom data, managing workflows, and using other programming languages for MapReduce.

This on-demand course is designed to be flexible to fit your schedule. Each lesson and quiz takes approximately 30 to 45 minutes to complete. Lab activities take additional time and vary based on your system.

Prerequisites

Required:

  • Completion of the on-demand course ESS 100 - Big Data Essentials
  • Completion of the on-demand course ESS 101 - Apache Hadoop Essentials
  • Completion of the on-demand course ESS 102 - MapR Converged Data Platform Essentials
  • beginner-to-intermediate fluency with Java or object-oriented programming in an IDE
  • basic Hadoop knowledge – helpful but not required
  • a Linux, PC or Mac with a MapR Sandbox downloaded (On-demand course)
  • connected to a Hadoop cluster via SSH and web browser (for ILT or vILT course)

Certification

This course help to prepare you for the MapR Certified Hadoop Professional: Developer (MCHD) certification exam.

Syllabus

Lesson 1: Introduction to MapReduce

  • Illustrate the MapReduce Model Conceptually
  • Brief History of MapReduce
  • Discuss How MapReduce Works at a High Level
  • Define How Data Flows in MapReduce
  • Hands-on Exercises

Lesson 2: Job Execution Framework – MapReduce v1 & v2

  • Describe the MapReduce v1 Job Execution Framework
  • Compare MapReduce v1 to MapReduce v2 (YARN)
  • Describe How Jobs Execute in YARN
  • Describe How to Manage Jobs in YARN
  • Hands-on Exercises

Lesson 3: Write a MapReduce program

  • Summary of the Programming Problem
  • Design and Implement the Mapper class, Reducer Class, and Driver
  • Build and Execute the Code then Examine the Output
  • Hands-on Exercises

Lesson 4: Use the MapReduce API

  • API Overview
  • Mapper Input Processing and Reducer Output Processing Data Flow
  • Explore the Mapper, Reducer, and Job Class API
  • Hands-on Exercises

Lesson 5: Managing, monitoring, and testing MapReduce jobs

  • Work with Counters
  • Use the MCS to Monitor Jobs
  • Use the Hadoop CLI to Manage Jobs
  • Display Job History and Logs
  • Write Unit Tests for MapReduce Programs
  • Hands-on Exercises

Lesson 6: Managing Performance

  • Review Components of MapReduce Performance
  • Enhance Performance in MapReduce Jobs
  • Overview of MapR Performance Enhancements
  • Hands-on Exercises

Lesson 7: Working with Data

  • Work with Sequence Files
  • Working with the Distributed Cache
  • Working with HBase
  • Hands-on Exercises

Lesson 8: Launching jobs

  • Implement Programmatic Job Control in the Driver
  • Use MapReduce Chaining
  • Use Oozie to Manage MapReduce Workflows
  • Hands-on Exercises

Lesson 9: Using non-Java programs (Streaming MapReduce)

  • Overview of the MapReduce Streaming Paradigm
  • Configure MapReduce Streaming Parameters
  • Define the Programming Contract for Mappers and Reducers
  • Monitor and Debug MapReduce Streaming Jobs
  • Hands-on Exercises

Curriculum

  • Lesson 1 - Introduction to Developing Hadoop Applications
  • Quiz 1
  • Lesson 2 - Job Execution Framework MapReduce v1 & v2
  • Quiz 2
  • Lesson 3 - Write MapReduce Programs
  • Quiz 3
  • Lesson 4 - Use the MapReduce API
  • Quiz 4
  • Lesson 5 - Manage, Monitor, and Test MapReduce
  • Quiz 5
  • Lesson 6 - Manage Performance
  • Quiz 6
  • Lesson 7 - Working with Data
  • Quiz 7
  • Lesson 8 - Launch Jobs
  • Quiz 8
  • Lesson 9 - Streaming MapReduce
  • Quiz 9
  • Course Materials
  • Slide Guide (Transcript)
  • Lab Guide
  • Join course discussions in the MapR Academy Community

About this Course

This course teaches developers how to write Hadoop Applications using MapReduce and YARN in Java. The course covers debugging, managing jobs, improving performance, working with custom data, managing workflows, and using other programming languages for MapReduce.

This on-demand course is designed to be flexible to fit your schedule. Each lesson and quiz takes approximately 30 to 45 minutes to complete. Lab activities take additional time and vary based on your system.

Prerequisites

Required:

  • Completion of the on-demand course ESS 100 - Big Data Essentials
  • Completion of the on-demand course ESS 101 - Apache Hadoop Essentials
  • Completion of the on-demand course ESS 102 - MapR Converged Data Platform Essentials
  • beginner-to-intermediate fluency with Java or object-oriented programming in an IDE
  • basic Hadoop knowledge – helpful but not required
  • a Linux, PC or Mac with a MapR Sandbox downloaded (On-demand course)
  • connected to a Hadoop cluster via SSH and web browser (for ILT or vILT course)

Certification

This course help to prepare you for the MapR Certified Hadoop Professional: Developer (MCHD) certification exam.

Syllabus

Lesson 1: Introduction to MapReduce

  • Illustrate the MapReduce Model Conceptually
  • Brief History of MapReduce
  • Discuss How MapReduce Works at a High Level
  • Define How Data Flows in MapReduce
  • Hands-on Exercises

Lesson 2: Job Execution Framework – MapReduce v1 & v2

  • Describe the MapReduce v1 Job Execution Framework
  • Compare MapReduce v1 to MapReduce v2 (YARN)
  • Describe How Jobs Execute in YARN
  • Describe How to Manage Jobs in YARN
  • Hands-on Exercises

Lesson 3: Write a MapReduce program

  • Summary of the Programming Problem
  • Design and Implement the Mapper class, Reducer Class, and Driver
  • Build and Execute the Code then Examine the Output
  • Hands-on Exercises

Lesson 4: Use the MapReduce API

  • API Overview
  • Mapper Input Processing and Reducer Output Processing Data Flow
  • Explore the Mapper, Reducer, and Job Class API
  • Hands-on Exercises

Lesson 5: Managing, monitoring, and testing MapReduce jobs

  • Work with Counters
  • Use the MCS to Monitor Jobs
  • Use the Hadoop CLI to Manage Jobs
  • Display Job History and Logs
  • Write Unit Tests for MapReduce Programs
  • Hands-on Exercises

Lesson 6: Managing Performance

  • Review Components of MapReduce Performance
  • Enhance Performance in MapReduce Jobs
  • Overview of MapR Performance Enhancements
  • Hands-on Exercises

Lesson 7: Working with Data

  • Work with Sequence Files
  • Working with the Distributed Cache
  • Working with HBase
  • Hands-on Exercises

Lesson 8: Launching jobs

  • Implement Programmatic Job Control in the Driver
  • Use MapReduce Chaining
  • Use Oozie to Manage MapReduce Workflows
  • Hands-on Exercises

Lesson 9: Using non-Java programs (Streaming MapReduce)

  • Overview of the MapReduce Streaming Paradigm
  • Configure MapReduce Streaming Parameters
  • Define the Programming Contract for Mappers and Reducers
  • Monitor and Debug MapReduce Streaming Jobs
  • Hands-on Exercises

Curriculum

  • Lesson 1 - Introduction to Developing Hadoop Applications
  • Quiz 1
  • Lesson 2 - Job Execution Framework MapReduce v1 & v2
  • Quiz 2
  • Lesson 3 - Write MapReduce Programs
  • Quiz 3
  • Lesson 4 - Use the MapReduce API
  • Quiz 4
  • Lesson 5 - Manage, Monitor, and Test MapReduce
  • Quiz 5
  • Lesson 6 - Manage Performance
  • Quiz 6
  • Lesson 7 - Working with Data
  • Quiz 7
  • Lesson 8 - Launch Jobs
  • Quiz 8
  • Lesson 9 - Streaming MapReduce
  • Quiz 9
  • Course Materials
  • Slide Guide (Transcript)
  • Lab Guide
  • Join course discussions in the MapR Academy Community