Apache Pig & Hive

Live Online (VILT) & Classroom Corporate Training Course

Apache Pig is known for its simplistic syntax and ability to decrease development time and hence is widely used by organizations that analyse Big Data. The Hive tool in the Hadoop ecosystem is much sought after because it is scalable and provides tools for easy data analysis and extraction.
Pig

How can we help you?

  • CloudLabs
    CloudLabs
  • Projects
    Projects
  • Assignments
    Assignments
  • 24x7 Support
    24x7 Support
  • Lifetime Access
    Lifetime Access
Box

Overview

This training will introduce you to the world of Hadoop and MapReduce. You will learn through a series of practical, hands on exercises on writing complex MapReduce transformations, about HDFSand writing scripts using the advanced features of Pig. You will understand the Hive environment, the Hive querying language and how to perform data analysis with Hive.

Box

Objectives

At the end of Apache Pig & Hive training course, participants will learn

  • How Big data can change the way businesses operate
  • The Hadoop ecosystem and its architecture
  • To analyse large data sets using Pig Latins scripts and parallel processing using MapReduce
  • About Hive and its use in Big Data
  • The benefits of HiveQL
  • To use Hive on complex data sets and derive insights to help business
Box

Prerequisites

  • Understanding of Linux commands and SQL queries
  • Basic Knowledge of core Java
Box

Course Outline

  • Hadoop overview
  • Surveying the Hadoop components
  • Defining the Hadoop architecture

  • Achieving reliable and secure storage
  • Monitoring storage metrics
  • Controlling HDFS from the Command Line

  • Contrasting Pig with MapReduce
  • Identifying Pig use cases
  • Pinpointing key Pig configurations

  • Pig Latin: Relational Operators
  • File Loaders
  • Group Operator
  • CO GROUP Operator
  • Joins and CO GROUP
  • Union, Diagnostic Operators
  • Pig UDF

Transforming data with Relational Operators

  • Creating new relations with joins
  • Reducing data size by sampling
  • Extending Pig with user–defined functions

Transforming data with Relational Operators

  • Creating new relations with joins
  • Reducing data size by sampling
  • Extending Pig with user–defined functions

Filtering data with Pig

  • Consolidating data sets with unions
  • Partitioning data sets with splits
  • Injecting parameters into Pig scripts

Transforming data with Relational Operators

  • Hive Background
  • Hive Use Case
  • About Hive
  • Hive vs Pig
  • Hive Architecture and Components
  • Meta-store in Hive
  • Limitations of Hive
  • Comparison with Traditional Database
  • Hive Data Types and Data Models
  • Partitions and Buckets
  • Hive Tables(Managed Tables and External Tables)
  • Importing Data
  • Querying Data
  • Managing Outputs

Transforming data with Relational Operators

  • Hive Script
  • Hive UDF and Hive Demo on Healthcare Data set
  • Hive QL: Joining Tables
  • Dynamic Partitioning
  • Custom MapReduce Scripts
  • Thrift Server
  • User Defined Functions
Box

Testimonials