Monitoring
Kafka Monitoring & Observability
Comprehensive monitoring setup for Kafka clusters using Prometheus, Grafana, and proper alerting.
Key Metrics to Monitor
Critical metrics every Kafka operator should track
Messages In/Sec
Rate of messages produced across all topics
Bytes In/Sec
Data throughput rate for producers
Bytes Out/Sec
Data throughput rate for consumers
Consumer Lag
Difference between log end offset and consumer offset
Under Replicated Partitions
Partitions with fewer replicas than ISR
Offline Partitions
Partitions without a leader
Active Controller
Number of active controllers in cluster
Request Latency
Time to process producer/consumer requests
Prometheus Configuration
Configure Prometheus to scrape Kafka metrics
# Kafka JMX Exporter Configuration
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'kafka'
static_configs:
- targets: ['kafka1:7071', 'kafka2:7071', 'kafka3:7071']
- job_name: 'zookeeper'
static_configs:
- targets: ['zk1:7071', 'zk2:7071', 'zk3:7071']Broker Metrics
Messages, bytes, request rates
Topic Metrics
Per-topic statistics
System Metrics
CPU, memory, disk I/O
Monitoring Checklist
Deploy JMX exporter on all brokers
Configure Prometheus scraping
Import Grafana dashboards
Set up critical alerts
Create runbooks for each alert
Monitor consumer lag actively
Track disk usage trends
Measure end-to-end latency