Machine Learning with Spark

Course Overview:

Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine Learning algorithms comb through data and identify patterns that are too complex to be discerned by the human mind.

These patterns can then be used for decision making and action Apache Spark is a powerful platform that for running Machine Learning. This course will how you how to perform various Machine Learning using Apache Spark built in MLib component.

Course Objectives:

  • Overview of Apache Spark
  • Clustering
  • Regression
  • Classification
  • Recommendation

Pre-requisites:

  • This is an intermediate course. Participants should have basic knowledge on the following subjects: Python Apache Spark

Target Audience:

  • Big Data Analysts
  • Data Scientists
  • Data Analysts

Course Duration:

  • 14 hours – 2 days

Course Content:

Module 1: Apache Spark Basics

  • Recap of Apache Spark Basics
  • Install Apache Spark on Local Computer
  • Read CSV Data
  • Manipulating Dataframe
  • ML Libraries

Module 2: Preprocessing 

  • Normalizer
  • Standardizer
  • Tokenizer
  • TF-IDF

Module 3: Clustering 

  • What is Clustering
  • Clustering Algorithms
  • KMeans Clustering
  • Hierarchical Clustering

Module 4: Classification 

  • What is Classification
  • Naives Bayes Clasiifier
  • Decision Tree Classifer
  • •Multi Layer Perception

Module 5: Regression 

  • What is Clustering
  • Clustering Algorithms
  • Linear Regression
  • Decision Tree Regression
  • Gradient Boosted Tree Regression

Module 6: ML Pipeline

  • What is Pipeline
  • Creating a Pipeline for Movie Review Classification

Module 7: Recommendation (Optional) 

  • Recommendation Systems
  • Collaborative Filtering
  • Summary and Closing Remarks

 

 

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Best selling courses

CLOUD COMPUTING

Enterprise Architecture

DATA SCIENCE

Tableau Basic

ARTIFICIAL INTELLIGENCE / MACHINE LEARNING / DEEP LEARNING

RPA with UiPath

PROGRAMMING / CODING

MATLAB Fundamentals