Course Overview:
Kafka provides high throughput and scalable messaging systems. Developed on LinkedIn, it can be used effectively in place of traditional messaging system like JMS, Apache MQ etc. Because Kafka can be integrated into systems like Spark, Storm, Hadoop etc, it is a highly recommended messaging system for solving big data problems involved in messaging system. Features of Kafka Cluster like scalability, Fault tolerance, High throughput, Durability etc. makes it a leader in today’s market in messaging systems. Due to extensive use of Kafka in market today, there is a steep increase in job opportunities. Kafka training courses cover integration of Kafka with other Big Data systems as well as how to set up real-time data pipelines for streaming applications.
Course Objectives:
- Understand Apache Kafka Ecosystem, Architecture, Core Concepts and Operations
- Master Concepts such as Topics, Partitions, Brokers, Producers, Consumers
- Start a personal Kafka development environment
- Learn major CLIs: kafka-topics, kafka-console-producer, kafka-console-consumer, kafka-consumergroups, kafka-configs
- Create your Producers and Consumers in Java to interact with Kafka
- Program a Real World Twitter Producer & ElasticSearch Consumer
- Extended APIs Overview (Kafka Connect, Kafka Streams), Case Studies and Big Data Architecture
- Practice and Understand Log Compaction
Pre-requisites:
To learn Kafka easily, step-by-step, you have come to the right place! No prior Kafka knowledge is required.
- A recent Windows / Mac / Linux machine with minimum 16GB of RAM, 100 GB of disk space
- Some understanding of Java Programming
- Good to have knowledge about Linux command line
- Desire to learn something awesome and new!
Target Audience:
- Some understanding of Java Programming
- Good to have knowledge about Linux command line
Course Duration:
- 21 hours – 3 days
Course Content:
- Introducing to Messaging Systems and their use cases, Problems and limitations (P2P, PubSub)
- Understand publish-subscribe messaging and how it fits in the big data ecosystem.
- JMS,RabbitMQ,ActiveMQ
- Kafka for Big Data & Data Ingestion, Role in ETL
- Why do we need Kafka? Components of Kafka
- Kafka use cases
- Setting Up a Kafka Cluster
- Setting Up Zookeeper
- A single node – a single broker cluster, Windows or Linux(Single VM)
- A single node – multiple broker cluters, Windows or Linus (Single VM)
- Multiples nodes – multiple broker clusters, EC2 Instances
- The Kafka broker property list
- Kafka Design, Leader, Follower, ISR, Offsets
- Kafka design fundamentals – Broker, Producer, Consumer, Topic, Partition
- Replication in Kafka
- Writing Producers, The c#/java producer/consumer API
- Avro Producer / Consumer c#/java
- Creating a Java producer with custom partitioning
- The Kafka producer property list acks, buffer.memory, compression.type, retries, batch.size, linger.ms client.id, max.in.flight.requests.per.connection, timeout.ms and metadata.fetch.timeout.ms
- Kafka Streams (Java/Scala)
- Kafka integration with Spark Java Streaming (Standalone VM)
- Kafka administration tools
- Kafka cluster mirroring and Monitoring using open source tools
- Frequent Problems and Solutions
- Kafka Manager Usage
- Kafka configuration files details and troubleshooting
- Partition management & config num. partitions, log. retention.ms, log.retention.bytes, log.segment.bytes log.segment.ms, message.max.bytes
- Disaster Recovery
- Low level clients
- Lost Message detection and recovery
- Use Kafka Connect to import/export data and Role of KSQL