DEV 360 - Apache Spark Essentials

This introductory course is targeted towards developers to enable them to build simple Spark applications.

Processing...
Processing...

About this course

This on-demand course is designed to be flexible to fit your schedule. Each lesson and quiz takes approximately 30 to 45 minutes to complete.

  • Option 1: Complete the course in one session, approximately 90 to 120 minutes
  • Option 2: Complete the course over a few days, 3 days of 30-45min/day

Lab activities take additional time and vary based on your system.

This introductory course is targeted towards developers to enable them to build simple Spark applications.  It introduces the benefits of Apache Spark for developing big data processing applications, loading and inspecting data using the Spark interactive shell and building a standalone application.

Syllabus

Lesson 1 – Introduction to Apache Spark

  • Describe the features of Apache Spark
    • Advantages of Spark
    • How Spark fits in with the Big Data application stack
    • How Spark fits in with Hadoop
  • Define Apache Spark components

Lesson 2 – Load and Inspect data in Spark

  • Describe different ways of getting data into Spark
  • Create and use Resilient Distributed Datasets (RDD)
  • Apply transformation to RDDs
  • Use actions on RDDs
  • Lab - Load and Inspect Data in RDD
    • Cache intermediate RDDs
    • Use Spark DataFrames for simple queries
      • Lab- Load and Inspect Data in DataFrames

    Lesson 3 – Build a simple Spark Application

    • Define the lifecycle of a Spark program
    • Define the function of SparkContext
      • Lab - Create the application
    • Define different ways to run a Spark application
    • Run your Spark application
      • Lab- Launch the application
    • Prerequisites for Success in the Course

      Review the following prerequisites carefully and decide if you are ready to succeed in this programming-oriented course. The Instructor will move forward with lab exercises, assuming that you have mastered the skills listed below.

      Required
      • Basic to intermediate Linux knowledge, including the ability to use a text editor, such as vi and familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd
      • Knowledge of application development principles
      • A Linux, Windows or MacOS computer with the MapR Sandbox installed (On-demand course)
      • Connection to a Hadoop cluster via SSH and web browser (for the ILT and vILT course)
      Recommended
      • Knowledge of functional programming
      • Knowledge of Scala or Python
      • Beginner fluency with SQL
      • HDE 100 - Hadoop Essentials Certification
      • This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.

Curriculum

  • Get Started
  • Lesson 1 - Introduction to Apache Spark
  • Lesson 1 Quiz
  • Lesson 2 - Load & Inspect Data
  • Lesson 2 Quiz
  • Lesson 3 - Build a Simple Apache Spark Application
  • Lesson 3 Quiz
  • Course Materials
  • Spark Developer Certification Study Guide
  • Slide Guide (Transcript)
  • Lab Guide
  • Lab Files and Data
  • Lab Environment Connection Guide
  • Join course discussions in the MapR Academy Community

About this course

This on-demand course is designed to be flexible to fit your schedule. Each lesson and quiz takes approximately 30 to 45 minutes to complete.

  • Option 1: Complete the course in one session, approximately 90 to 120 minutes
  • Option 2: Complete the course over a few days, 3 days of 30-45min/day

Lab activities take additional time and vary based on your system.

This introductory course is targeted towards developers to enable them to build simple Spark applications.  It introduces the benefits of Apache Spark for developing big data processing applications, loading and inspecting data using the Spark interactive shell and building a standalone application.

Syllabus

Lesson 1 – Introduction to Apache Spark

  • Describe the features of Apache Spark
    • Advantages of Spark
    • How Spark fits in with the Big Data application stack
    • How Spark fits in with Hadoop
  • Define Apache Spark components

Lesson 2 – Load and Inspect data in Spark

  • Describe different ways of getting data into Spark
  • Create and use Resilient Distributed Datasets (RDD)
  • Apply transformation to RDDs
  • Use actions on RDDs
  • Lab - Load and Inspect Data in RDD
    • Cache intermediate RDDs
    • Use Spark DataFrames for simple queries
      • Lab- Load and Inspect Data in DataFrames

    Lesson 3 – Build a simple Spark Application

    • Define the lifecycle of a Spark program
    • Define the function of SparkContext
      • Lab - Create the application
    • Define different ways to run a Spark application
    • Run your Spark application
      • Lab- Launch the application
    • Prerequisites for Success in the Course

      Review the following prerequisites carefully and decide if you are ready to succeed in this programming-oriented course. The Instructor will move forward with lab exercises, assuming that you have mastered the skills listed below.

      Required
      • Basic to intermediate Linux knowledge, including the ability to use a text editor, such as vi and familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd
      • Knowledge of application development principles
      • A Linux, Windows or MacOS computer with the MapR Sandbox installed (On-demand course)
      • Connection to a Hadoop cluster via SSH and web browser (for the ILT and vILT course)
      Recommended
      • Knowledge of functional programming
      • Knowledge of Scala or Python
      • Beginner fluency with SQL
      • HDE 100 - Hadoop Essentials Certification
      • This course is part of the preparation for the MapR Certified Spark Developer (MCSD) certification exam.

Curriculum

  • Get Started
  • Lesson 1 - Introduction to Apache Spark
  • Lesson 1 Quiz
  • Lesson 2 - Load & Inspect Data
  • Lesson 2 Quiz
  • Lesson 3 - Build a Simple Apache Spark Application
  • Lesson 3 Quiz
  • Course Materials
  • Spark Developer Certification Study Guide
  • Slide Guide (Transcript)
  • Lab Guide
  • Lab Files and Data
  • Lab Environment Connection Guide
  • Join course discussions in the MapR Academy Community