Hadoop Administration

Live Online (VILT) & Classroom Corporate Training Course

Hadoop Administration introduces you to the fundamental concepts of Apache Hadoop and Hadoop cluster.
Hadoop

How can we help you?

  • CloudLabs
    CloudLabs
  • Projects
    Projects
  • Assignments
    Assignments
  • 24x7 Support
    24x7 Support
  • Lifetime Access
    Lifetime Access
Box

Overview

Hadoop Administration training introduces you to the fundamental concepts of Apache Hadoop and Hadoop cluster. Through hands on exercises and practice sessions you will learn to configure, deploy and maintain a Hadoop cluster, and to confidently navigate the Hadoop ecosystem.

Box

Objectives

At the end of Hadoop Administration training course, participants will

  • Implement and manage the ongoing administration of a Hadoop cluster
  • Build powerful applications to analyse Big Data and learn to manage and monitor the Hadoop cluster
  • Ensure performance tuning of Hadoop clusters and Hadoop MapReduce routines
Box

Prerequisites

Basic knowledge of Linux

Box

Course Outline

  • Hadoop cluster architecture
  • Data loading into HDFS
  • Roles and Responsibilities of a Hadoop Cluster Administrator

  • Hadoop server roles and their usage
  • Rack awareness
  • Write and Read
  • Replication Pipeline
  • Data Processing
  • Hadoop Installation and Initial Configuration
  • Deploying Hadoop in pseudo-distributed mode
  • Deploying a multi-node Hadoop cluster
  • Installing Hadoop Clients

  • Selecting the appropriate hardware
  • Designing a scalable cluster
  • Building the cluster
    • Installing the Hadoop daemons
    • Optimizing the network architecture
  • Managing and scheduling jobs
  • Types of schedulers in Hadoop
  • Configuring the schedulers and run MapReduce jobs
  • Cluster monitoring and troubleshooting

  • How to manage hardware failures
  • Securing Hadoop clusters
  • Configuring Hadoop backup
  • Distcp to copy data
  • Cluster maintenance
  • Configuring HDFS Federation
  • Basics of Hadoop Platform Security
  • Securing the Platform
  • Configuring Kerberos

  • Isolating single points of failure
  • Maintaining High Availability
  • Triggering manual failover
  • Automating failover with Zookeeper
  • Extending HDFS resources
  • Managing the namespace volumes
  • Critiquing the YARN architecture
  • Identifying the new daemons

  • Starting and stopping Hadoop daemonso Monitoring HDFS status
  • Adding and removing data nodes
  • Managing MapReduce jobs
  • Tracking progress with monitoring tools
  • Commissioning and decommissioning compute nodes

  • Oozie
  • Hcatalog/Hive Administration
  • HBase Architecture
  • HBase setup
  • HBase and Hive Integration
  • HBase performance optimization
Box

Testimonials