DEV 361 - Build and Monitor Apache Spark Applications

Not currently available
Processing...
Not currently available
Processing...

About this Course

DEV 361 is the second in the Apache Spark series. You will learn to create and modify pair RDDs, perform aggregations, and control the layout of pair RDDs across nodes with data partitioning.

This course also discusses Spark SQL and DataFrames, the programming abstraction of Spark SQL. You will learn the different ways to load data into DataFrames, perform operations on DataFrames using DataFrame functions, actions and language integrated queries, and create and use user-defined functions with DataFrames.

This course also describes the components of the Spark execution model using the Spark Web UI to monitor Spark applications. The concepts are taught using scenarios in Scala that also form the basis of hands-on labs. Lab solutions are provided in Scala and Python.

Prerequisites

  • Completion of ESS 100, ESS 101ESS 360
  • Basic Hadoop knowledge and intermediate linux knowledge
  • Experience using a text editor such as vi
  • Terminal program installed; familiarity with command-line options such as mv, cp, ssh, grep, cd, and useradd
  • Knowledge of functional programming with Scala or Python, and experience with SQL

Certification

This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.

Syllabus

Lesson 4  Work with Pair RDD

  • Describe and Create Pair RDD
  • Lab 4.1: Load and Explore Data
  • Apply Transformations and Actions to Pair RDD
  • Lab 4.2: Create and Explore Pair RDD
  • Control Partitioning Across Nodes
  • Lab 4.3: Explore Partitioning

Lesson 5  Work with Spark DataFrames

  • Create Apache Spark DataFrames
  • Lab 5.1: Create DataFrames Using Reflection
  • Explore Data in DataFrames
  • Lab 5.2: Explore Data in DataFrames
  • Create User Defined Functions
  • Lab 5.3: Create and Use User Defined Functions
  • Repartition DataFrames
  • Lab 5.4: Build a Standalone Application

Lesson 6  Monitor a Spark Application

  • Describe the Components of the Spark Execution Model
  • Use Spark Web UI to Monitor Spark Applications
  • Debug and Tune Spark Applications
  • Lab 6.1: Use the Spark UI
  • Lab 6.2: Find Spark System Properties

Curriculum

  • Lesson 4 - Work with Pair RDD
  • Quiz 4
  • Lesson 5 - Work with DataFrames
  • Quiz 5
  • Lesson 6 - Monitor Apache Spark Applications
  • Quiz 6

About this Course

DEV 361 is the second in the Apache Spark series. You will learn to create and modify pair RDDs, perform aggregations, and control the layout of pair RDDs across nodes with data partitioning.

This course also discusses Spark SQL and DataFrames, the programming abstraction of Spark SQL. You will learn the different ways to load data into DataFrames, perform operations on DataFrames using DataFrame functions, actions and language integrated queries, and create and use user-defined functions with DataFrames.

This course also describes the components of the Spark execution model using the Spark Web UI to monitor Spark applications. The concepts are taught using scenarios in Scala that also form the basis of hands-on labs. Lab solutions are provided in Scala and Python.

Prerequisites

  • Completion of ESS 100, ESS 101ESS 360
  • Basic Hadoop knowledge and intermediate linux knowledge
  • Experience using a text editor such as vi
  • Terminal program installed; familiarity with command-line options such as mv, cp, ssh, grep, cd, and useradd
  • Knowledge of functional programming with Scala or Python, and experience with SQL

Certification

This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.

Syllabus

Lesson 4  Work with Pair RDD

  • Describe and Create Pair RDD
  • Lab 4.1: Load and Explore Data
  • Apply Transformations and Actions to Pair RDD
  • Lab 4.2: Create and Explore Pair RDD
  • Control Partitioning Across Nodes
  • Lab 4.3: Explore Partitioning

Lesson 5  Work with Spark DataFrames

  • Create Apache Spark DataFrames
  • Lab 5.1: Create DataFrames Using Reflection
  • Explore Data in DataFrames
  • Lab 5.2: Explore Data in DataFrames
  • Create User Defined Functions
  • Lab 5.3: Create and Use User Defined Functions
  • Repartition DataFrames
  • Lab 5.4: Build a Standalone Application

Lesson 6  Monitor a Spark Application

  • Describe the Components of the Spark Execution Model
  • Use Spark Web UI to Monitor Spark Applications
  • Debug and Tune Spark Applications
  • Lab 6.1: Use the Spark UI
  • Lab 6.2: Find Spark System Properties

Curriculum

  • Lesson 4 - Work with Pair RDD
  • Quiz 4
  • Lesson 5 - Work with DataFrames
  • Quiz 5
  • Lesson 6 - Monitor Apache Spark Applications
  • Quiz 6