DEV 360 - Introduction to Apache Spark

Not currently available

About this Course

This introductory course is targeted towards developers to enable them to build simple Spark applications. It introduces the benefits of Apache Spark for developing big data processing applications, loading and inspecting data using the Spark interactive shell and building a standalone application.

Prerequisites

  • Completion of ESS 100, ESS 101ESS 360
  • Basic Hadoop knowledge and intermediate linux knowledge
  • Experience using a text editor such as vi
  • Terminal program installed; familiarity with command-line options such as mv, cp, ssh, grep, cd, and useradd
  • Knowledge of functional programming with Scala or Python, and experience with SQL

Certification

This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.

Syllabus

Lesson 1  Introduction to Apache Spark

  • Describe the Features of Apache Spark
  • Define Apache Spark Components

Lesson 2  Load and Inspect Data in Spark

  • Describe the Different Data Sources and Formats in Spark
  • Create and Use Resilient Distributed Datasets (RDD)
  • Lab 2.1: Load and Inspect Auction Data
  • Apply Operations to RDDs
  • Cache Intermediate RDD
  • Create and Use DataFrames
  • Lab 2.2: Load and Inspect Data in DataFrames

Lesson 3  Build a Simple Spark Application

  • Define the Lifecycle of a Spark Program
  • Define the Function of SparkContext
  • Lab 3.1: Build a Simple Spark Application
  • Define Different Ways to Run a Spark Application
  • Run Your Spark Application
  • Lab 3.2: Package the Spark Application
  • Lab 3.3: Launch the Application

Curriculum

  • ESS 360 – Apache Spark Essentials
  • Quiz 1
  • Lesson 2: Load and Inspect Data
  • Quiz 2
  • Lesson 3: Build a Simple Apache Spark Application
  • Quiz 3

About this Course

This introductory course is targeted towards developers to enable them to build simple Spark applications. It introduces the benefits of Apache Spark for developing big data processing applications, loading and inspecting data using the Spark interactive shell and building a standalone application.

Prerequisites

  • Completion of ESS 100, ESS 101ESS 360
  • Basic Hadoop knowledge and intermediate linux knowledge
  • Experience using a text editor such as vi
  • Terminal program installed; familiarity with command-line options such as mv, cp, ssh, grep, cd, and useradd
  • Knowledge of functional programming with Scala or Python, and experience with SQL

Certification

This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.

Syllabus

Lesson 1  Introduction to Apache Spark

  • Describe the Features of Apache Spark
  • Define Apache Spark Components

Lesson 2  Load and Inspect Data in Spark

  • Describe the Different Data Sources and Formats in Spark
  • Create and Use Resilient Distributed Datasets (RDD)
  • Lab 2.1: Load and Inspect Auction Data
  • Apply Operations to RDDs
  • Cache Intermediate RDD
  • Create and Use DataFrames
  • Lab 2.2: Load and Inspect Data in DataFrames

Lesson 3  Build a Simple Spark Application

  • Define the Lifecycle of a Spark Program
  • Define the Function of SparkContext
  • Lab 3.1: Build a Simple Spark Application
  • Define Different Ways to Run a Spark Application
  • Run Your Spark Application
  • Lab 3.2: Package the Spark Application
  • Lab 3.3: Launch the Application

Curriculum

  • ESS 360 – Apache Spark Essentials
  • Quiz 1
  • Lesson 2: Load and Inspect Data
  • Quiz 2
  • Lesson 3: Build a Simple Apache Spark Application
  • Quiz 3