DA 450 - Transform Data with Apache Pig

Not currently available
Processing...
Not currently available
Processing...

About this Course

DA 450 - Transform Data with Apache Pig, targeted at Data Analysts, Scientists, and SQL programmers, covers how to use Pig to analyze structured data without writing MapReduce code.  The course begins with a review of data pipeline tools, then covers how to load and manipulate relations in Pig.

Together with DA 440 - Query and Store Data with Apache Hive, this course covers how to use Pig and Hive as part of a single data flow in a Hadoop cluster. The course begins with a review of data pipeline tools, then covers how to load and manipulate relations in Pig.

Prerequisites

  • Completion of ESS 100, ESS 101ESS 450
  • Basic Hadoop knowledge
  • Terminal program installed; familiarity with command-line navigation

Certification

The courses in this curriculum prepare you for the MapR Certified Data Analyst (MCDA) certification exam.

Syllabus

Lesson 1  Pig in the Hadoop Ecosystem

  • Use Cases of Pig
  • Lab 1.1: Connect to the Grunt Shell
  • Steps in the Data Pipeline
  • Data Types Used in Pig

Lesson 2  Extract, Transform, and Load Data

  • Load Data into Relations
  • Lab 2.1: Load Data into Pig Relations
  • Debug Pig Scripts
  • Lab 2.2: Examine Pig Relations
  • Perform Simple Manipulations
  • Lab 2.3: Basic Data Manipulations
  • Save Relations as Files
  • Lab 2.4: Store Data

Lesson 3  Manipulate Data

  • Subset Relations
  • Lab 3.1: Load and Filter Relations
  • Combine Relations
  • Lab 3.2: Transform and Join Relations
  • Use UDFs on Relations
  • Lab 3.3: Explore Data

Curriculum

  • ESS 450 – Apache Pig Essentials
  • Lesson 2 - Extract, Transform, and Load Data with Apache Pig
  • Quiz 2
  • Lesson 3 - Manipulate Data with Apache Pig
  • Quiz 3

About this Course

DA 450 - Transform Data with Apache Pig, targeted at Data Analysts, Scientists, and SQL programmers, covers how to use Pig to analyze structured data without writing MapReduce code.  The course begins with a review of data pipeline tools, then covers how to load and manipulate relations in Pig.

Together with DA 440 - Query and Store Data with Apache Hive, this course covers how to use Pig and Hive as part of a single data flow in a Hadoop cluster. The course begins with a review of data pipeline tools, then covers how to load and manipulate relations in Pig.

Prerequisites

  • Completion of ESS 100, ESS 101ESS 450
  • Basic Hadoop knowledge
  • Terminal program installed; familiarity with command-line navigation

Certification

The courses in this curriculum prepare you for the MapR Certified Data Analyst (MCDA) certification exam.

Syllabus

Lesson 1  Pig in the Hadoop Ecosystem

  • Use Cases of Pig
  • Lab 1.1: Connect to the Grunt Shell
  • Steps in the Data Pipeline
  • Data Types Used in Pig

Lesson 2  Extract, Transform, and Load Data

  • Load Data into Relations
  • Lab 2.1: Load Data into Pig Relations
  • Debug Pig Scripts
  • Lab 2.2: Examine Pig Relations
  • Perform Simple Manipulations
  • Lab 2.3: Basic Data Manipulations
  • Save Relations as Files
  • Lab 2.4: Store Data

Lesson 3  Manipulate Data

  • Subset Relations
  • Lab 3.1: Load and Filter Relations
  • Combine Relations
  • Lab 3.2: Transform and Join Relations
  • Use UDFs on Relations
  • Lab 3.3: Explore Data

Curriculum

  • ESS 450 – Apache Pig Essentials
  • Lesson 2 - Extract, Transform, and Load Data with Apache Pig
  • Quiz 2
  • Lesson 3 - Manipulate Data with Apache Pig
  • Quiz 3