Duration: 7 Hours (1 Intensive Day) Focus: Hands-on Deployment, Data Ingestion (Kafka Connect), and Operational Management
Module 1: Kafka Ecosystem Foundations (1.5 Hours)
- Ecosystem Overview (High-level review of Apache Kafka Ecosystem: Brokers, Topics, Producers, Consumers, Zookeeper (or KRaft), Kafka Streams, and Kafka Connect).
- Installation and Execution (Practical steps for Installing and Running Apache Kafka (broker and Zookeeper/KRaft) locally).
- Connect Architecture (Understanding The Components of a Kafka Connect Cluster—workers, tasks, and configuration storage).
Module 2: Data Ingestion and Transformation (3 Hours)
- Deployment (Deploying Kafka Connect (Standalone vs. Distributed mode) and initializing the worker process).
- Connector Configuration (Configuring the Connectors (Source and Sink) using JSON properties).
- Database Ingestion (Hands-on lab: Ingesting Database Data into Apache Kafka using a source connector (e.g., Debezium or JDBC)).
- Log Ingestion (Lab: Ingesting Data from a Web Server Log using a file-based or equivalent source connector).
- Transformation (Applying Transformations (Single Message Transformations – SMTs) to messages during ingestion (e.g., renaming fields, filtering)).
- Real-time Data (Conceptual overview of Ingesting Real-time Data from the Web via specialized connectors).
- Validation and Management (Validating the Connection output in a Kafka topic and Managing Connections with the REST API (status, start, stop)).
Module 3: Advanced Topics and Operations (2.5 Hours)
- Customization Overview (Introduction to Writing Your Own Connector and using pre-built connectors (e.g., Defining Dynamic Input/Output Streams)).
- Stream Processing (High-level overview of Processing and Analyzing Data with Kafka Streams for lightweight transformations).
- Case Study (Review of a data pipeline: Reading and Transforming Data from Twitter or a similar real-world stream).
- Production Management (Monitoring and Managing Kafka Connect in Production (metrics, log analysis, worker health)).
- Troubleshooting (Common issues with connectors, offset management, and network problems).