Course Overview:
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine Learning algorithms comb through data and identify patterns that are too complex to be discerned by the human mind.
These patterns can then be used for decision making and action Apache Spark is a powerful platform that for running Machine Learning. This course will how you how to perform various Machine Learning using Apache Spark built in MLib component.
Course Objectives:
- Overview of Apache Spark
- Clustering
- Regression
- Classification
- Recommendation
Pre-requisites:
- This is an intermediate course. Participants should have basic knowledge on the following subjects: Python Apache Spark
Target Audience:
- Big Data Analysts
- Data Scientists
- Data Analysts
Course Duration:
- 14 hours – 2 days
Course Content:
Module 1: Apache Spark Basics
- Recap of Apache Spark Basics
- Install Apache Spark on Local Computer
- Read CSV Data
- Manipulating Dataframe
- ML Libraries
Module 2: Preprocessing
- Normalizer
- Standardizer
- Tokenizer
- TF-IDF
Module 3: Clustering
- What is Clustering
- Clustering Algorithms
- KMeans Clustering
- Hierarchical Clustering
Module 4: Classification
- What is Classification
- Naives Bayes Clasiifier
- Decision Tree Classifer
- •Multi Layer Perception
Module 5: Regression
- What is Clustering
- Clustering Algorithms
- Linear Regression
- Decision Tree Regression
- Gradient Boosted Tree Regression
Module 6: ML Pipeline
- What is Pipeline
- Creating a Pipeline for Movie Review Classification
Module 7: Recommendation (Optional)
- Recommendation Systems
- Collaborative Filtering
- Summary and Closing Remarks