Scala for Spark

Inquire now

Course Overview:

Hadoop Fundamentals is a one-stop course that introduces you to the domain of spark development as well as gives you technical knowhow of the same. At the end of this course you will be able to earn a credential of Spark professional and you will be capable of dealing with Terabyte scale of data and analyze it successfully using spark and its ecosystem. Scala is a condensed version of Java for large scale functional and object-oriented programming. Apache Spark Streaming is an extended component of the Spark API for processing big data sets as real-time streams. Together, Spark Streaming and Scala enable the streaming of big data.

Course Objectives:

  • Create Spark applications with the Scala programming language.
  • Use Spark Streaming to process continuous streams of data.
  • Process streams of real-time data with Spark Streaming.

Pre-requisites:

  • Programming and scripting experience

Target Audience:

  • Software Engineers

Course Duration:

  • 21 hours – 3 days

Course Content:

Introduction

Scala Programming in Depth Review

  • Syntax and structure
  • Flow control and function 

Spark Internals

  • Resilient Distributed Datasets (RDD)
  • Spark script to graph to cluster 

Overview of Spark Streaming

  • Streaming architecture
  • Intervals in streaming
  • Fault tolerance

Preparing the Development Environment

  • Installing and configuring Apache Spark
  • Installing and configuring the Scala IDE
  • Installing and configuring JDK

Spark Streaming Beginner to Advanced

  • Working with key/value RDD’s
  • Filtering RDD’s
  • Improving Spark scripts with regular expressions
  • Sharing data on a cluster
  • Working with network data sets
  • Implementing BFS algorithms
  • Creating Spark driver scripts
  • Tracking in real time with scripts
  • Writing continuous applications
  • Streaming linear regression
  • Using Spark Machine Learning Library

Spark and Clusters

  • Bundling dependencies and Spark scripts using the SBT tool
  • Using EMR for illustrating clusters
  • Optimizing by partitioning RDD’s
  • Using Spark logs 

Integration in Spark Streaming

  • Integrating Apache Kafka and working with Kafka topics
  • Integrating Apache Fume and working with pull-based/push-based Flume configurations
  • Writing a custom receiver class
  • Integrating Cassandra and exposing data as real-time services

In Production

  • Packaging an application and running it with Spark-Submit
  • Troubleshooting, tuning, and debugging Spark Jobs and clusters

 

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Inquire now

Best selling courses

Duration 3 days – 21 hrs   Overview    This Portfolio Management Training Course is designed to provide banking professionals with a comprehensive understanding of how to effectively manage investment...

Duration 2 days – 14 hrs   Overview   This comprehensive Planning and Forecasting Training Course is designed to empower professionals with the tools and techniques necessary to accurately predict...

Duration 2 days – 14 hrs   Overview   This hands-on course provides an introduction to Splunk, a powerful platform for searching, monitoring, and analyzing machine-generated data. The training focuses...

Duration 3 days – 21 hrs   Overview.   This course is designed for fresh graduates aspiring to build a career in Data Science. It introduces the fundamentals of data...

Among the most popular and widely implemented NoSQL databases is MongoDB. Its scalability, robustness, and flexibility have made it extremely popular among the Fortune 500 and Global 500 companies who use it to implement a variety of activities including social communications, analytics, content management, archiving, and other activities.

PROGRAMMING / CODING

ASP.NET

SP.NET is a framework for developing dynamic web applications. It supports languages like VB.Net, C#, Jscript.Net, etc. The programming logic and content can be developed separately in Microsoft Asp.Net.

CYBER SECURITY

Physical Security

Duration 3 days – 21 hrs   Overview   This course provides a comprehensive introduction to physical security principles, policies, technologies, and practices. It covers methods to assess physical risks,...

Duration 5 days – 35 hrs   Overview   This intensive 5-day course is designed for professionals seeking advanced-level skills in Microsoft SQL Server’s BI stack: SSRS (SQL Server Reporting...

We use cookies on our website to personalize your experience by storing your preferences and recognizing repeat visits. By clicking “Accept”, you agree to the use of all cookies. You can also select “Cookie Settings” to adjust your preferences and provide more specific consent. Cookie Policy