DevLearn
Monitoring

Kafka Monitoring & Observability

Comprehensive monitoring setup for Kafka clusters using Prometheus, Grafana, and proper alerting.

Key Metrics to Monitor

Critical metrics every Kafka operator should track

Messages In/Sec

Rate of messages produced across all topics

Bytes In/Sec

Data throughput rate for producers

Bytes Out/Sec

Data throughput rate for consumers

Consumer Lag

Difference between log end offset and consumer offset

Under Replicated Partitions

Partitions with fewer replicas than ISR

Offline Partitions

Partitions without a leader

Active Controller

Number of active controllers in cluster

Request Latency

Time to process producer/consumer requests

Prometheus Configuration

Configure Prometheus to scrape Kafka metrics

# Kafka JMX Exporter Configuration
# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'kafka'
    static_configs:
      - targets: ['kafka1:7071', 'kafka2:7071', 'kafka3:7071']

  - job_name: 'zookeeper'
    static_configs:
      - targets: ['zk1:7071', 'zk2:7071', 'zk3:7071']

Broker Metrics

Messages, bytes, request rates

Topic Metrics

Per-topic statistics

System Metrics

CPU, memory, disk I/O

Monitoring Checklist

Deploy JMX exporter on all brokers
Configure Prometheus scraping
Import Grafana dashboards
Set up critical alerts
Create runbooks for each alert
Monitor consumer lag actively
Track disk usage trends
Measure end-to-end latency