DA 440 - Query and Store Data with Apache Hive

Not currently available

About this Course

DA 440 is an introductory-level course targeted at Data Analysts, Scientists and SQL programmers. It covers how to use Hive to query structured data without writing MapReduce code.  You will learn how Apache Hive fits in the Hadoop ecosystem, how to create and load tables in Hive, and how to query data using the Hive Query Language.

Together with DA 450 - Transform Data with Apache Pig, this course covers how to use Pig and Hive as part of a single data flow in a Hadoop cluster. The course begins with a review of SQL-on-Hadoop tools, then covers how to create, load, query, and manipulate tables in Hive.

Prerequisites

  • Completion of ESS 100, ESS 101ESS 440
  • Basic Hadoop knowledge
  • Terminal program installed; familiarity with command-line navigation

Certification

The courses in this curriculum prepare you for the MapR Certified Data Analyst (MCDA) certification exam.

Syllabus

Lesson 1  Hive in the Hadoop Ecosystem

  • Hive Use Cases
  • Lab 1.1: Connect to the Hive CLI
  • Steps in the Data Pipeline
  • Hive in the Hadoop Ecosystem
  • Data Types Use With Hive
  • Lab 1.4: Cast Data

Lesson 2  Create and Load Data

  • Create Databases and Internal Tables
  • Lab 2.1a: Create a Database
  • Lab 2.1b: Create a Simple Table
  • Create External Tables and Partitioned Tables
  • Lab 2.2: Create Partitioned and External Tables
  • Load Data Into Tables and Databases
  • Lab 2.3: Load Data Into Tables
  • Alter and Drop Tables
  • Lab 2.4: Examine Databases and Tables

Lesson 3  Query and Manipulate Data

  • Query, Sort, and Filter Data
  • Lab 3.1: Query Data With SELECT
  • Manipulate Data With User-defined Functions
  • Lab 3.2: Query Data With UDFs
  • Combine and Store Tables
  • Lab 3.3: Combine and Store Data

Curriculum

  • ESS 440 - Apache Hive in the Hadoop Ecosystem
  • Lesson 2 - Create and Load Data in Apache Hive
  • Quiz 2
  • Lesson 3 - Query Data in Apache Hive
  • Quiz 3

About this Course

DA 440 is an introductory-level course targeted at Data Analysts, Scientists and SQL programmers. It covers how to use Hive to query structured data without writing MapReduce code.  You will learn how Apache Hive fits in the Hadoop ecosystem, how to create and load tables in Hive, and how to query data using the Hive Query Language.

Together with DA 450 - Transform Data with Apache Pig, this course covers how to use Pig and Hive as part of a single data flow in a Hadoop cluster. The course begins with a review of SQL-on-Hadoop tools, then covers how to create, load, query, and manipulate tables in Hive.

Prerequisites

  • Completion of ESS 100, ESS 101ESS 440
  • Basic Hadoop knowledge
  • Terminal program installed; familiarity with command-line navigation

Certification

The courses in this curriculum prepare you for the MapR Certified Data Analyst (MCDA) certification exam.

Syllabus

Lesson 1  Hive in the Hadoop Ecosystem

  • Hive Use Cases
  • Lab 1.1: Connect to the Hive CLI
  • Steps in the Data Pipeline
  • Hive in the Hadoop Ecosystem
  • Data Types Use With Hive
  • Lab 1.4: Cast Data

Lesson 2  Create and Load Data

  • Create Databases and Internal Tables
  • Lab 2.1a: Create a Database
  • Lab 2.1b: Create a Simple Table
  • Create External Tables and Partitioned Tables
  • Lab 2.2: Create Partitioned and External Tables
  • Load Data Into Tables and Databases
  • Lab 2.3: Load Data Into Tables
  • Alter and Drop Tables
  • Lab 2.4: Examine Databases and Tables

Lesson 3  Query and Manipulate Data

  • Query, Sort, and Filter Data
  • Lab 3.1: Query Data With SELECT
  • Manipulate Data With User-defined Functions
  • Lab 3.2: Query Data With UDFs
  • Combine and Store Tables
  • Lab 3.3: Combine and Store Data

Curriculum

  • ESS 440 - Apache Hive in the Hadoop Ecosystem
  • Lesson 2 - Create and Load Data in Apache Hive
  • Quiz 2
  • Lesson 3 - Query Data in Apache Hive
  • Quiz 3