Course Overview:
Hadoop Development, Administration and BI Program is a one-stop course that introduces you to the domain of Hadoop Development as well as gives you technical know-how of the same.
Course Objectives:
At the end of this course you will be able to earn a credential of Hadoop professional and you will be capable of dealing with Terabyte scale of data and analyze it successfully using MapReduce
- Learn the basics of Big Data and Hadoop
- Play with Hadoop and Hadoop Ecosystem
- Become a top-notch Hadoop Developer
Pre-requisites:
- Typically, professionals with basic knowledge of software development, programming languages, and databases will find this course helpful. Basic knowledge should be enough to succeed at this course
- Not For: Students who are absolute beginners at software development as a discipline will find it difficult to follow the course
Target Audience:
- Developers
Course Duration:
- 35 hours – 5days
Course Content:
Phase 1: Hadoop Fundamentals with multinode setup (Day 1,2)
Laying the foundation
Big Data
- What is Big Data
- Dimensions of Big Data
- Big Data in Advertising
- Big Data in Banking
- Big Data in Telecom
- Big Data in eCommerce
- Big Data in Healthcare
- Big Data in Defense
- Processing options of Big Data
- Hadoop as an option
Hadoop
- What is Hadoop
- How Hadoop Works
- HDFS
- Mapreduce Deep Dive
- How Hadoop has an edge
Hadoop Ecosystem
- Sqoop
- Oozie
- Pig
- Hive
- Flume
Hadoop Hands On
- Running HDFS commands
- Running your Mapreduce program
- Running Sqoop Import and Sqoop Export
- Creating Hive tables directly from Sqoop
- Creating Hive tables
- Querying Hive tables
- Running an Oozie workflow
Phase 2: Hadoop Development (Day 3)
Become a Pro developer
Apache Spark
- What is Spark?
- Using Spark Shell
- RDD Fundamentals
- Functional Programming
- Program
RDD in Depth
- RDDs
- Creating RDDs from files
- Creating RDDs for another RDDs
- RDD operations
- Actions
- Transformations
- Pair RDDs
- Joins using RDD
Spark platforms
- Spark local mode
- Spark standalone mode
- Spark on YARN
- Spark on Mesos
Spark Hands On
- Python Spark Shell
- Scala Spark Shell
- Basic operations on RDDs
- Pair RDD Hands On
Spark SQL & Dataframes
- Spark SQL and the SQL Context
- Creating Dataframes
- Dataframe Queries and Transformations
- Saving Dataframes
- Dataframes and RDDs
Spark Dataframes Hands On
- Dataframes on a JSON file
- Dataframes on hive tables
- Dataframes on JSON
- Querying operations dataframes
Spark SQL & Dataframes
- What is Spark Streaming
- How it works
- DStreams
- Developing Spark Streaming Applications
Phase 3: Hadoop BI (Day 4) Analyze data using Pig and Hive
Hive
- Introduction
- Basic Data Analysis with Hive
- Hive Data Management
- Text Processing with Hive
- Transformations in Hive
- Optimizing Hive
- Hive Hands On
- Extending Hive
Impala
- Introduction
- Basic Data Analysis with Impala
- Text Processing with Impala
- Optimizing Impala
- Impala Hands On
Pig
- Introduction
- How Pig works
- Pig Hands On
Phase 4: NoSQL and Cluster Walkthrough (Day 5) Cluster Administration
NoSQL Databases
- Why NoSQL
- What is NoSQL Databases
- Types of NoSQL Databases
- Introduction to Cassandra and MongoDB and Hbase
- Hbase Hands On
Cloudera Manager Setup
- Why Cluster Manager software
- Cloudera Manager
- Using Cloudera Manager to setup cluster
- Cluster walkthrough
Final Test
Course Customization Options
To request a customized training for this course, please contact us to arrange.