Apache Spark and Scala

Live Online (VILT) & Classroom Corporate Training Course

Apache Spark is a big data processing framework and its popularity lies in the fact that it is fast, easy to use and offers sophisticated solutions to data analysis. Its built-in modules for streaming, machine learning, SQL, and graph processing make it useful in diverse Industries.
Apache Spark

How can we help you?

  • CloudLabs
    CloudLabs
  • Projects
    Projects
  • Assignments
    Assignments
  • 24x7 Support
    24x7 Support
  • Lifetime Access
    Lifetime Access
Box

Overview

Apache Spark and Scala course is designed to help you become proficient in Apache Spark Development. You will learn about topics such as Apache Spark Core, Motivation for Apache Spark, Spark Internals, RDD, SparkSQL, Spark Streaming, MLlib, and GraphX that form key constituents of the Apache Spark course.

Box

Objectives

At the end of Apache Spark & Scala training course, participants will

  • Master the concepts of the Apache Spark framework
  • Understand the Spark Internals RDD and use of Spark’s API and Scala functions to create RDDs and transform RDDs
  • Master the RDD Combiners, SparkSQL, Spark Context, Spark Streaming, MLlib, and GraphX
Box

Prerequisites

  • Hadoop Basics
Box

Course Outline

  • Overview of Hadoop
  • Architecture of  HDFS  & YARN
  • Overview of Spark version 2.2.0
  • Spark Architecture
  • Spark  Components
  • Comparison of  Spark &  Hadoop
  • Installation of Spark v 2.2.0 on Linux 64 bit

  • Exploring the Spark shell
  • Creating Spark Context
  • Operations on Resilient Distributed Dataset – RDD
  • Transformations & Actions
  • Loading Data and Saving Data

  • Introduction to SQL  Operations
  • SQL Context
  • Data Frame
  • Working with Hive
  • Loading Partitioned Tables
  • Processing  CSV, Json ,Parquet files

  • Introduction to Scala
  • Feature of Scala
  • Scala vs Java Comparison
  • Data types
  • Data Structure
  • Arrays
  • Literals
  • Logical Operators
  • Mutable & Immutable variables
  • Type interface

Transforming data with Relational Operators

  • Oops  vs Functions
  • Anonymous
  • Recursive
  • Call-by-name
  • Currying
  • Conditional statement

  • List
  • Map
  • Sets
  • Options
  • Tuples
  • Mutable collection
  • Immutable collection
  • Iterating
  • Filtering and counting
  • Group By
  • Flat Map
  • Word count
  • File Access

  • Classes, Objects & Properties
  • Inheritance

  • Maven  build tool implementation
  • Build Libraries
  • Create  Jar files
  • Spark-Submit

  • Overview  of Spark Streaming
  • Architecture of Spark Streaming
  • File streaming
  • Twitter Streaming

  • Overview  of Kafka Streaming
  • Architecture of Kafka Streaming
  • Kafka Installation
  • Topic
  • Producer
  • Consumer
  • File streaming
  • Twitter Streaming

  • Overview  of Machine Learning Algorithm
  • Linear Regression
  • Logistic Regression

  • GraphX overview
  • Vertices
  • Edges
  • Triplets
  • Page Rank
  • Pregel

  • On-Off-heap memory tuning
  • Kryo Serialization
  • Broadcast Variable
  • Accumulator Variable
  • DAG Scheduler
  • Data Locality
  • Check Pointing
  • Speculative Execution
  • Garbage Collection

  • Master – Driver Node capacity
  • Slave –   Worker Node capacity
  • Executor capacity
  • Executor core capacity
  • Project scenario and execution
  • Out-of-memory error handling
  • Master logs, Worker logs, Driver  logs
  • Monitoring Web UI
  • Heap memory dump
Box

Testimonials