Understanding Apache Kafka
Foundation concepts every Java developer needs
25%
2/8 lessons
Lessons
What is Apache Kafka?
An introduction to the world's most popular distributed streaming platform
Apache Kafka is an open-source distributed event streaming platform capable of handling trillions of events a day. Originally developed by LinkedIn and later donated to the Apache Software Foundation, Kafka has become the de facto standard for building real-time data pipelines and streaming applications.
Core Components
Topics
Categories/feeds to which records are published. Partitioned for parallelism.
Producers
Publish data to topics. Choose partition per record (round-robin or key-based).
Consumers
Subscribe to topics and process records. Organized in consumer groups.
Brokers
Servers that store data and serve client requests. Form the Kafka cluster.
Code Example
# Create a Kafka topic
kafka-topics.sh --create --topic orders \
--bootstrap-server localhost:9092 \
--partitions 3 --replication-factor 1
# List topics
kafka-topics.sh --list --bootstrap-server localhost:9092
# Produce messages
kafka-console-producer.sh --topic orders \
--bootstrap-server localhost:9092
# Consume messages
kafka-console-consumer.sh --topic orders \
--bootstrap-server localhost:9092 --from-beginningKey Concepts
High Throughput
Handle millions of messages per second with sub-millisecond latency
Durable Storage
Messages persist on disk with configurable retention policies
Scalable
Scale horizontally by adding more brokers to the cluster
Fault Tolerant
Automatic failover with configurable replication factor
Key Insight
Kafka is often described as a "distributed commit log" or "distributed streaming platform." Think of it as a durable, scalable, and highly available message queue that can store events for as long as you need.