DEV 360 - Introduction to Apache Spark (Spark v2.1)

DEV 360 - Introduction to Apache Spark (Spark v2.1)

About this Course

This introductory course, targeted to developers, enables you to build simple Spark applications for Apache Spark version 2.1. It introduces the benefits of Spark for developing big data processing applications, loading, and inspecting data using the Spark interactive shell and building a standalone application.

This is the first course in the Apache Spark v2.1 Series.

What's Covered

Course Lessons Lab Activities

1: Introduction to Apache Spark

Describe Features of Apache Spark
Define Spark Components
Explain Spark Data Pipeline Use Cases

 

No labs

2: Create Datasets

Define Data Sources, Structures, and Schemas
Create Datasets and DataFrames
Convert DataFrames into Datasets

 

Load Data and Create Datasets Using Reflection
Bonus Lab: Word Count Using Datasets (Optional)

3: Apply Operations on Datasets

Apply Operations on Datasets
Cache Datasets
Create User Defined Functions (UDFs)
Repartition Datasets

 

Explore SFPD Data
Create and Use UDFs
Analyze Data Using UDF and Queries

Get Certified

This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.

Prerequisites

  • Completion of the MapR Academy on-demand courses: ESS 100 - 102
  • Basic Hadoop knowledge and intermediate Linux knowledge
  • Experience using a text editor such as vi
  • Terminal program installed; familiarity with command-line options such as mv, cp, ssh, grep, cd, and useradd
  • Knowledge of functional programming with Scala, and experience with SQL

Curriculum

  • Lesson 1 - Introduction to Apache Spark
  • Quiz 1
  • Lesson 2 - Create Datasets
  • Quiz 2
  • Lesson 3 - Apply Operations on Datasets
  • Quiz 3
  • Course Materials
  • Slide Guide (Transcript)
  • Lab Guide

About this Course

This introductory course, targeted to developers, enables you to build simple Spark applications for Apache Spark version 2.1. It introduces the benefits of Spark for developing big data processing applications, loading, and inspecting data using the Spark interactive shell and building a standalone application.

This is the first course in the Apache Spark v2.1 Series.

What's Covered

Course Lessons Lab Activities

1: Introduction to Apache Spark

Describe Features of Apache Spark
Define Spark Components
Explain Spark Data Pipeline Use Cases

 

No labs

2: Create Datasets

Define Data Sources, Structures, and Schemas
Create Datasets and DataFrames
Convert DataFrames into Datasets

 

Load Data and Create Datasets Using Reflection
Bonus Lab: Word Count Using Datasets (Optional)

3: Apply Operations on Datasets

Apply Operations on Datasets
Cache Datasets
Create User Defined Functions (UDFs)
Repartition Datasets

 

Explore SFPD Data
Create and Use UDFs
Analyze Data Using UDF and Queries

Get Certified

This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.

Prerequisites

  • Completion of the MapR Academy on-demand courses: ESS 100 - 102
  • Basic Hadoop knowledge and intermediate Linux knowledge
  • Experience using a text editor such as vi
  • Terminal program installed; familiarity with command-line options such as mv, cp, ssh, grep, cd, and useradd
  • Knowledge of functional programming with Scala, and experience with SQL

Curriculum

  • Lesson 1 - Introduction to Apache Spark
  • Quiz 1
  • Lesson 2 - Create Datasets
  • Quiz 2
  • Lesson 3 - Apply Operations on Datasets
  • Quiz 3
  • Course Materials
  • Slide Guide (Transcript)
  • Lab Guide