Lesson 4: Work with Pair RDD
Lesson 5: Work with DataFrames
Lesson 6: Monitor Apache Spark Applications
- Course Materials
Spark Developer Certification Study Guide
Slide Guide (Transcript)
Join MapR Community Discussions
DEV 361 - Build and Monitor Apache Spark Applications
This on-demand course is designed to be flexible to fit your schedule. Each lesson and quiz takes approximately 30 to 45 minutes to complete.
- Option 1: Complete the course in one session, approximately 90 to 120 minutes
- Option 2: Complete the course over a few days, 3 days of 30-45min/day
Lab activities take additional time and vary based on your system.
DEV 361 is the second in the Apache Spark series. You will learn to create and modify pair RDDs, perform aggregations, and control the layout of pair RDDs across nodes with data partitioning.
This course also discusses Spark SQL and DataFrames, the programming abstraction of Spark SQL. You will learn the different ways to load data into DataFrames, perform operations on DataFrames using DataFrame functions, actions and language integrated queries, and create and use user-defined functions with DataFrames.
This course also describes the components of the Spark execution model using the Spark Web UI to monitor Spark applications. The concepts are taught using scenarios in Scala that also form the basis of hands-on labs. Lab solutions are provided in Scala and Python.
Lesson 4 - Work with Pair RDD
- Describe pair RDD
- Why use pair RDD
- Create pair RDD
- Apply transformations and actions to pair RDD
- Control partitioning across nodes
- Changing paritions
- Determine the partitioner
Lesson 5 - Work with Spark DataFrames
- Create Apache Spark DataFrames
- Work with data in DataFrames
- Create user defined functions
- Repartition DataFrame
Lesson 6 - Monitor a Spark Application
- Describe the components of the Spark execution model
- Use the SparkUI to monitor a Spark application
- Debug & tune Spark applications
Prerequisites for Success in the Course
Review the following prerequisites carefully and decide if you are ready to succeed in this programming-oriented course. The Instructor will move forward with lab exercises, assuming that you have mastered the skills listed below.
- Basic to intermediate Linux knowledge, including the ability to use a text editor, such as vi and familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd
- Knowledge of application development principles
- A Linux, Windows or MacOS computer with the MapR Sandbox installed (On-demand course)
- Connection to a Hadoop cluster via SSH and web browser (for the ILT and vILT course)
- Knowledge of functional programming
- Knowledge of Scala or Python
- Beginner fluency with SQL
- HDE 100 - Hadoop Essentials Certification
- This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.