Intro to Big Data and Hadoop

Live Online (VILT) & Classroom Corporate Training Course

Given the ease with which it allows you to make sense of huge volumes of data and leverage frameworks to transform the same into actionable insights, training for Hadoop & Big Data are in great demand.
Hadoop

How can we help you?

  • CloudLabs
    CloudLabs
  • Projects
    Projects
  • Assignments
    Assignments
  • 24x7 Support
    24x7 Support
  • Lifetime Access
    Lifetime Access
Box

Overview

This training course will help participants to gain the skills they need to store, manage, process, and analyze massive amounts of structured and unstructured data to extract meaningful insights.

Box

Objectives

At the end of Intro to Big Data & Hadoop training course, participants will

  • Understand what Big Data is and gain in-depth knowledge of Big Data Analytics concepts and tools.
  • Learn to Process large data sets with Big Data tools to extract information from disparate sources.
  • Learn about MapReduce, Hadoop Distributed File System (HDFS), YARN, and how to write MapReduce code.
  • Learn best practices and considerations for Hadoop development as well as debugging techniques.
  • Learn how to use Hadoop frameworks like ApachePig™, ApacheHive™, Sqoop, Flume, among other projects.
  • Perform real-world analytics by learning advanced Hadoop API topics with an e-courseware.
Box

Prerequisites

Before undertaking a Big Data and Hadoop course, participant is recommended to have a basic knowledge of programming languages like Python, Scala, Java and a better understanding of SQL and RDBMS.

Box

Course Outline

  • Understanding Big Data
  • Types of Big Data
  • Difference between Traditional Data and Big Data
  • Introduction to Hadoop
  • Distributed Data Storage In Hadoop, HDFS and Hbase
  • Hadoop Data processing Analyzing Services MapReduce and spark, Hive Pig and Storm
  • Data Integration Tools in Hadoop
  • Resource Management and cluster management Services

  • Need of Hadoop in Big Data
  • Understanding Hadoop And Its Architecture
  • The MapReduce Framework
  • What is YARN?
  • Understanding Big Data Components
  • Monitoring, Management and Orchestration Components of Hadoop Ecosystem
  • Different Distributions of Hadoop
  • Installing Hadoop 3

  • Hortonworks sandbox installation & configuration
  • Hadoop Configuration files
  • Working with Hadoop services using Ambari
  • Hadoop Daemons
  • Browsing Hadoop UI consoles
  • Basic Hadoop Shell commands
  • Eclipse & winscp installation & configurations on VM

  • Running a MapReduce application in MR2
  • MapReduce Framework on YARN
  • Fault tolerance in YARN
  • Map, Reduce & Shuffle phases
  • Understanding Mapper, Reducer & Driver classes
  • Writing MapReduce WordCount program
  • Executing & monitoring a Map Reduce job

  • SparkSQL and DataFrames
  • DataFrames and the SQL API
  • DataFrame schema
  • Datasets and encoders
  • Loading and saving data
  • Aggregations
  • Joins

  • A short introduction to streaming
  • Spark Streaming
  • Discretized Streams
  • Stateful and stateless transformations
  • Checkpointing
  • Operating with other streaming platforms (such as Apache Kafka)
  • Structured Streaming

  • Background of Pig
  • Pig architecture
  • Pig Latin basics
  • Pig execution modes
  • Pig processing – loading and transforming data
  • Pig built-in functions
  • Filtering, grouping, sorting data
  • Relational join operators
  • Pig Scripting
  • Pig UDF’s

  • Background of Hive
  • Hive architecture
  • Hive Query Language
  • Derby to MySQL database
  • Managed & external tables
  • Data processing – loading data into tables
  • Hive Query Language
  • Using Hive built-in functions
  • Partitioning data using Hive
  • Bucketing data
  • Hive Scripting
  • Using Hive UDF’s

  • HBase overview
  • Data model
  • HBase architecture
  • HBase shell
  • Zookeeper & its role in HBase environment
  • HBase Shell environment
  • Creating table
  • Creating column families
  • CLI commands – get, put, delete & scan
  • Scan Filter operations

  • Importing data from RDBMS to HDFS
  • Exporting data from HDFS to RDBMS
  • Importing & exporting data between RDBMS & Hive tables

  • Overview of Oozie
  • Oozie Workflow Architecture
  • Creating workflows with Oozie
  • Introduction to Flume
  • Flume Architecture
  • Flume Demo

  • Introduction
  • Tableau
  • Chart types
  • Data visualization tools
Box

Testimonials