Transform Data with Apache Pig

Transform Data with Apache Pig

About this Course

This course covers how to use Pig to analyze structured data without writing MapReduce code. It starts with a review of data pipeline tools, then covers how to load, manipulate relations and use UDFs in relations in Pig. Together with DA 440 – Query and Store Data with Apache Hive, you will learn how to use Pig and Hive as part of a single data flow in a Hadoop cluster.

What's Covered

Course Lessons Lab Activities

1: Pig in the Hadoop Ecosystem

Hive Use Cases
Use Cases of Pig
Steps in the Data Pipeline
Data Types Used in Pig

 

Connect to the Hive CLI
Connect to the Grunt Shell

2: Extract, Transform, and Load Data

Load Data into Relations
Debug Pig Scripts
Perform Simple Manipulations
Save Relations as Files

 

Load Data into Pig Relations
Examine Pig Relations
Basic Data Manipulations
Store Data

3: Manipulate Data

Subset Relations
Combine Relations
Use UDFs on Relations

 

Load and Filter Relations
Transform and Join Relations
Explore Data

Get Certified

This course is part of the preparation for the MapR Certified Data Analyst (MCDA) certification exam.

Prerequisites

  • Completion of ESS 100 - 102
  • Linux skills, including familiarity with command-line options such as ls, cd, cp, and su
  • Beginning to intermediate proficiency with SQL
  • Basic Hadoop knowledge

Curriculum

  • Lesson 1 – Apache Pig in the Hadoop Ecosystem
  • Lesson 2 – ETL with Apache Pig
  • Quiz 2
  • Lesson 3 – Manipulate Data in Apache Pig
  • Quiz 3
  • Course Materials
  • Slide Guide (Transcript)
  • Lab Guide

About this Course

This course covers how to use Pig to analyze structured data without writing MapReduce code. It starts with a review of data pipeline tools, then covers how to load, manipulate relations and use UDFs in relations in Pig. Together with DA 440 – Query and Store Data with Apache Hive, you will learn how to use Pig and Hive as part of a single data flow in a Hadoop cluster.

What's Covered

Course Lessons Lab Activities

1: Pig in the Hadoop Ecosystem

Hive Use Cases
Use Cases of Pig
Steps in the Data Pipeline
Data Types Used in Pig

 

Connect to the Hive CLI
Connect to the Grunt Shell

2: Extract, Transform, and Load Data

Load Data into Relations
Debug Pig Scripts
Perform Simple Manipulations
Save Relations as Files

 

Load Data into Pig Relations
Examine Pig Relations
Basic Data Manipulations
Store Data

3: Manipulate Data

Subset Relations
Combine Relations
Use UDFs on Relations

 

Load and Filter Relations
Transform and Join Relations
Explore Data

Get Certified

This course is part of the preparation for the MapR Certified Data Analyst (MCDA) certification exam.

Prerequisites

  • Completion of ESS 100 - 102
  • Linux skills, including familiarity with command-line options such as ls, cd, cp, and su
  • Beginning to intermediate proficiency with SQL
  • Basic Hadoop knowledge

Curriculum

  • Lesson 1 – Apache Pig in the Hadoop Ecosystem
  • Lesson 2 – ETL with Apache Pig
  • Quiz 2
  • Lesson 3 – Manipulate Data in Apache Pig
  • Quiz 3
  • Course Materials
  • Slide Guide (Transcript)
  • Lab Guide