DEV 361 - Build and Monitor Apache Spark Applications (Spark v1.6)

DEV 361 - Build and Monitor Apache Spark Applications (Spark v1.6)

About this Course

DEV 361 is the second in the Apache Spark series for Spark v1.6. You will learn to create and modify pair RDDs, perform aggregations, and control the layout of pair RDDs across nodes with data partitioning.

This course also discusses Spark SQL and DataFrames, the programming abstraction of Spark SQL. You will learn the different ways to load data into DataFrames, perform operations on DataFrames using DataFrame functions, actions and language integrated queries, and create and use user-defined functions with DataFrames.

Lastly, this course describes the components of the Spark execution model using the Spark Web UI to monitor Spark applications. The concepts are taught using scenarios in Scala that also form the basis of hands-on labs. Lab solutions are provided in Scala and Python.

What's Covered

Course Lessons Lab Activities

4: Work With Pair RDD

Describe and Create Pair RDD
Apply Transformations and Actions to Pair RDD
Control Partitioning Across Nodes

 

Load and Explore Data
Create and Explore Pair RDD
Explore Partitioning

5: Work with Spark DataFrames

Create Apache Spark DataFrames
Explore Data in DataFrames
Create User Defined Functions
Repartition DataFrames

 

Create DataFrames Using Reflection
Explore Data in DataFrames
Create and Use User Defined Functions
Build a Standalone Application

6: Monitor a Spark Application

Describe the Components of the Spark Execution Model
Use Spark Web UI to Monitor Spark Applications
Debug and Tune Spark Applications

 

Use the Spark UI
Find Spark System Properties

Get Certified

This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.

Prerequisites

  • Completion of ESS 100 - 102, and ESS 360
  • Basic Hadoop knowledge and intermediate Linux knowledge
  • Experience using a text editor such as vi
  • Terminal program installed; familiarity with command-line options such as mv, cp, ssh, grep, cd, and useradd
  • Knowledge of functional programming with Scala or Python, and experience with SQL

Curriculum

  • Lesson 4 - Work with Pair RDD
  • Quiz 4
  • Lesson 5 - Work with DataFrames
  • Quiz 5
  • Lesson 6 - Monitor Apache Spark Applications
  • Quiz 6
  • Course Materials
  • Slide Guide (Transcript)
  • Lab Guide

About this Course

DEV 361 is the second in the Apache Spark series for Spark v1.6. You will learn to create and modify pair RDDs, perform aggregations, and control the layout of pair RDDs across nodes with data partitioning.

This course also discusses Spark SQL and DataFrames, the programming abstraction of Spark SQL. You will learn the different ways to load data into DataFrames, perform operations on DataFrames using DataFrame functions, actions and language integrated queries, and create and use user-defined functions with DataFrames.

Lastly, this course describes the components of the Spark execution model using the Spark Web UI to monitor Spark applications. The concepts are taught using scenarios in Scala that also form the basis of hands-on labs. Lab solutions are provided in Scala and Python.

What's Covered

Course Lessons Lab Activities

4: Work With Pair RDD

Describe and Create Pair RDD
Apply Transformations and Actions to Pair RDD
Control Partitioning Across Nodes

 

Load and Explore Data
Create and Explore Pair RDD
Explore Partitioning

5: Work with Spark DataFrames

Create Apache Spark DataFrames
Explore Data in DataFrames
Create User Defined Functions
Repartition DataFrames

 

Create DataFrames Using Reflection
Explore Data in DataFrames
Create and Use User Defined Functions
Build a Standalone Application

6: Monitor a Spark Application

Describe the Components of the Spark Execution Model
Use Spark Web UI to Monitor Spark Applications
Debug and Tune Spark Applications

 

Use the Spark UI
Find Spark System Properties

Get Certified

This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.

Prerequisites

  • Completion of ESS 100 - 102, and ESS 360
  • Basic Hadoop knowledge and intermediate Linux knowledge
  • Experience using a text editor such as vi
  • Terminal program installed; familiarity with command-line options such as mv, cp, ssh, grep, cd, and useradd
  • Knowledge of functional programming with Scala or Python, and experience with SQL

Curriculum

  • Lesson 4 - Work with Pair RDD
  • Quiz 4
  • Lesson 5 - Work with DataFrames
  • Quiz 5
  • Lesson 6 - Monitor Apache Spark Applications
  • Quiz 6
  • Course Materials
  • Slide Guide (Transcript)
  • Lab Guide