Kafka monitoring

Deprecation notice

This extension documentation is now deprecated and will no longer be updated. We recommend using the new Kafka extension for improved functionality and support.

Apache Kafka is an open-source, distributed publish-subscribe message bus designed to be fast, scalable, and durable. Dynatrace automatically recognizes Kafka processes and instantly gathers Kafka metrics on the process and cluster levels.

For information on general Kafka message queue monitoring, see Custom messaging services.

Prerequisites

  • Dynatrace SaaS/Managed version 1.155+
  • Apache Kafka or Confluent-supported Kafka 0.9.0.1+
  • If you have more than one Kafka cluster, separate the clusters into individual process groups via an environment variable in Dynatrace settings

Activation

  1. Go to Settings.
  2. Select Monitoring > Monitored technologies.
  3. Find Kafka and turn on the Global monitoring switch.
    After you turn Kafka monitoring on, Dynatrace automatically activates Kafka monitoring on all hosts and monitors all Kafka components.

Events

Name

Condition

Dynatrace event

Under-replicated partitions

Partition followers are out-of-sync with the leader

Performance (PERFORMANCE_EVENT)

Offline partitions

There are no partition leaders

Performance (PERFORMANCE_EVENT)

Cluster controller mismatch

There are multiple controllers detected by brokers

Error (ERROR_EVENT)

To customize problem detection thresholds for Kafka

  1. Go to Settings.
  2. Open Anomaly detection > Extension events and find Kafka in the list.

Metrics

Cluster metrics

Metric

Description

Partitions

All partition replicas available on this broker. The leader partition counts as a partition replica. This should be even across the cluster.

Under replicated partitions

The number of under-replicated partitions in the cluster. Under-replicated partitions indicate that replication is ongoing, consumers aren’t getting data, and latency is growing.

Offline partitions

The number of partitions without active leaders and thus not writable.

Active cluster controllers

The number of active controllers in the cluster. An alert is raised if the aggregated sum across all brokers in the cluster is anything other than 1, because there should be exactly one controller per cluster.

Broker metrics

Metric

Description

Mean time

Time taken to flush the partition log to disk either exceeds time to flush or exceeds maximum size.

95th percentile

The 95th percentile of log flush time. Even a slight log flush time change can drastically affect Kafka performance.

Incoming byte rate

The incoming broker byte rate throughput from clients (consumers, producers, and connectors).

Outgoing byte rate

The outgoing broker byte rate throughput from clients (consumers, producers, and connectors).

Partitions

All partition replicas available on this broker. The leader partition counts as a partition replica. This should be even across the cluster.

Under replicated partitions

The number of under-replicated partitions.

Request metrics

Metric

Description

Requests per second

Requests per second.

Total time per request

Total time per request.

Kafka producer, consumer, and connect metrics

Metric

Description

Requests

Number of requests processed per second by client.

Request size

Average size of request in a one-minute frame.

Incoming/outgoing byte rate

Processed byte rate by client.