Hadoop Development 0 to 100

Course Overview:

Hadoop Development, Administration and BI Program is a one-stop course that introduces you to the domain of Hadoop Development as well as gives you technical know-how of the same.

Course Objectives:

At the end of this course you will be able to earn a credential of Hadoop professional and you will be capable of dealing with Terabyte scale of data and analyze it successfully using MapReduce

  • Learn the basics of Big Data and Hadoop
  • Play with Hadoop and Hadoop Ecosystem
  • Become a top-notch Hadoop Developer

Pre-requisites:

  • Typically, professionals with basic knowledge of software development, programming languages, and databases will find this course helpful. Basic knowledge should be enough to succeed at this course
  • Not For: Students who are absolute beginners at software development as a discipline will find it difficult to follow the course

Target Audience:

  • Developers

Course Duration:

  • 35 hours – 5days

Course Content:

Phase 1: Hadoop Fundamentals with multinode setup (Day 1,2)

Laying the foundation

Big Data

  • What is Big Data
  • Dimensions of Big Data
  • Big Data in Advertising
  • Big Data in Banking
  • Big Data in Telecom
  • Big Data in eCommerce
  • Big Data in Healthcare
  • Big Data in Defense
  • Processing options of Big Data
  • Hadoop as an option

Hadoop

  • What is Hadoop
  • How Hadoop Works
  • HDFS
  • Mapreduce Deep Dive
  • How Hadoop has an edge

Hadoop Ecosystem

  • Sqoop
  • Oozie
  • Pig
  • Hive
  • Flume

Hadoop Hands On

  • Running HDFS commands
  • Running your Mapreduce program
  • Running Sqoop Import and Sqoop Export
  • Creating Hive tables directly from Sqoop
  • Creating Hive tables
  • Querying Hive tables
  • Running an Oozie workflow

Phase 2: Hadoop Development (Day 3)

Become a Pro developer

Apache Spark

  • What is Spark?
  • Using Spark Shell
  • RDD Fundamentals
  • Functional Programming
  • Program

RDD in Depth

  • RDDs
  • Creating RDDs from files
  • Creating RDDs for another RDDs
  • RDD operations
  • Actions
  • Transformations
  • Pair RDDs
  • Joins using RDD

Spark platforms

  • Spark local mode
  • Spark standalone mode
  • Spark on YARN
  • Spark on Mesos

Spark Hands On

  • Python Spark Shell
  • Scala Spark Shell
  • Basic operations on RDDs
  • Pair RDD Hands On

Spark SQL & Dataframes

  • Spark SQL and the SQL Context
  • Creating Dataframes
  • Dataframe Queries and Transformations
  • Saving Dataframes
  • Dataframes and RDDs

Spark Dataframes Hands On

  • Dataframes on a JSON file
  • Dataframes on hive tables
  • Dataframes on JSON
  • Querying operations dataframes

Spark SQL & Dataframes

  • What is Spark Streaming
  • How it works
  • DStreams
  • Developing Spark Streaming Applications

Phase 3: Hadoop BI (Day 4) Analyze data using Pig and Hive

Hive

  • Introduction
  • Basic Data Analysis with Hive
  • Hive Data Management
  • Text Processing with Hive
  • Transformations in Hive
  • Optimizing Hive
  • Hive Hands On
  • Extending Hive

Impala

  • Introduction
  • Basic Data Analysis with Impala
  • Text Processing with Impala
  • Optimizing Impala
  • Impala Hands On

Pig

  • Introduction
  • How Pig works
  • Pig Hands On

Phase 4: NoSQL and Cluster Walkthrough (Day 5) Cluster Administration

NoSQL Databases

  • Why NoSQL
  • What is NoSQL Databases
  • Types of NoSQL Databases
  • Introduction to Cassandra and MongoDB and Hbase
  • Hbase Hands On

Cloudera Manager Setup

  • Why Cluster Manager software
  • Cloudera Manager
  • Using Cloudera Manager to setup cluster
  • Cluster walkthrough

Final Test

 

 

Course Customization Options

To request a customized training for this course, please contact us to arrange.

 

Best selling courses

CLOUD COMPUTING

Enterprise Architecture

DATA SCIENCE

Tableau Basic

ARTIFICIAL INTELLIGENCE / MACHINE LEARNING / DEEP LEARNING

RPA with UiPath

PROGRAMMING / CODING

MATLAB Fundamentals