Synergefic | Big Data Hadoop Spark Developer (BDHS)

Overview

With this Big Data Hadoop course, you will learn the big data framework using Hadoop and Spark, including HDFS, YARN, and MapReduce. The course will also cover Pig, Hive, and Impala to process and analyse large datasets stored in the HDFS and use Sqoop and Flume for data ingestion.

Objectives

At the end of BDHS training, participants will be able to understand:

The different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
Hadoop Distributed File System (HDFS) and YARN architecture
MapReduce and its characteristics and assimilate advanced MapReduce concepts
Different types of file formats, Avro schema, using Avro with Hive, and Sqoop and Schema evolution
Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
HBase, its architecture and data storage, and learn the difference between HBase and RDBMS
The common use cases of Spark and various interactive algorithms

Prerequisites

There are no prerequisites for this course. However, it’s beneficial to have some knowledge of Core Java and SQL.

Course Outline

Apache Hadoop Overview
Data Processing
Introduction to the Hands-On Exercises

Apache Hadoop Cluster Components
HDFS Architecture
Using HDFS

YARN Architecture
Working With YARN

What is Apache Spark?
Starting the Spark Shell
Using the Spark Shell
Getting Started with Datasets and DataFrames
DataFrame Operations

Creating DataFrames from Data Sources
Saving DataFrames to Data Sources
DataFrame Schemas
Eager and Lazy Execution

Querying DataFrames Using Column Expressions
Grouping and Aggregation Queries
Joining DataFrames

RDD Overview
RDD Data Sources
Creating and Saving RDDs
RDD Operations

Writing and Passing Transformation Functions
Transformation Execution
Converting Between RDDs and DataFrames
Key-Value Pair RDDs
Map-Reduce
Other Pair RDD Operations

Datasets and DataFrames
Creating Datasets
Loading and Saving Datasets
Dataset Operations

Writing a Spark Application
Building and Running an Application
Application Deployment Mode
The Spark Application Web UI
Configuring Application Properties

Review: Apache Spark on a Cluster
RDD Partitions
Example: Partitioning in Queries
Stages and Tasks
Job Execution Planning
Example: Catalyst Execution Plan
Example: RDD Execution Plan

Apache Spark Streaming Overview
Creating Streaming DataFrames
Transforming DataFrames
Executing Streaming Queries
Receiving Kafka Messages
Sending Kafka Messages

Testimonials

Synergific Software Team has been very supportive, and working with them has been a best decision that we could ever made, They are just a call away. You guys are AWESOME, Thank You, Keep up the Good Work!!!

Shamsudeen Bawa

Vice President, J.P Morgan, CIS, USA

Synergific Software has been of great help and I plan to continue to use your services in the future for my business needs.

Farhan Hafiz

Data Architect, Fiserv

I think Synergific Software is great. I liked that it was hassle free and easy to set up. Again, it's a great feature for a fast and cheap set up, which gives me peace of mind, as I know have a terms of use agreement.

Dr. Sahdev Singh

Under Secretary, Ministry of Law & Justice, Govt. of India

I liked using Synergific Software very much. I thought the website was easy to navigate and the instructions for generating the terms was clear. I even recommended you on a Facebook Group I am a member of.

M Chikanna Swamy

Director & Learning Head, Mindtree

Big Data Hadoop Spark Developer (BDHS)

Live Online (VILT) & Classroom Corporate Training Course

The Big Data Hadoop training course will teach you the concepts of the Hadoop framework, its formation in a cluster environment, and prepares you for Cloudera's Big Data certification.

How can we help you?

CloudLabs

Projects

Assignments

24x7 Support

Lifetime Access

Overview

Objectives

Prerequisites

Course Outline

Testimonials

Big Data Hadoop Spark Developer (BDHS)

Live Online (VILT) & Classroom Corporate Training Course

The Big Data Hadoop training course will teach you the concepts of the Hadoop framework, its formation in a cluster environment, and prepares you for Cloudera's Big Data certification.

How can we help you?

CloudLabs

Projects

Assignments

24x7 Support

Lifetime Access

Overview

Objectives

Prerequisites

Course Outline

Introduction to Apache Hadoop and the Hadoop Ecosystem

Apache Hadoop File Storage

Distributed Processing on an Apache Hadoop Cluster

Apache Spark Basics

Working with DataFrames and Schemas

Analyzing Data with DataFrame Queries

RDD Overview

Transforming & Aggregating Data with RDDs

Working with Datasets in Scala

Writing, Configuring, and Running Spark Applications

Spark Distributed Processing

Structured Streaming

Testimonials