Aiven for Apache Kafka® metrics available via Prometheus#

The following list only contains the most common metrics available via Prometheus for an Aiven for Apache Kafka® service.

You can retrieve the complete list of available metrics for your specific service by requesting the Prometheus endpoint, substituting:

the Aiven project certificate (ca.pem)
the Prometheus credentials (<PROMETHEUS_USER>:<PROMETHEUS_PASSWORD>)
the Aiven for Apache Kafka hostname (<KAFKA_HOSTNAME>)
the Prometheus port (<PROMETHEUS_PORT>)

curl --cacert ca.pem \
    --user '<PROMETHEUS_USER>:<PROMETHEUS_PASSWORD>' \
    'https://<KAFKA_HOSTNAME>:<PROMETHEUS_PORT>/metrics'

Tip

You can check how to use Prometheus with Aiven in the dedicated document.

CPU utilization#

cpu_usage_guest: CPU time spent running a virtual CPU for guest operating systems.
cpu_usage_guest_nice: The amount of time the CPU runs a virtual CPU for a guest operating system, which is low-priority and can be interrupted by other processes. This metric is measured in hundredths of a second.
cpu_usage_idle: Time the CPU spends doing nothing.
cpu_usage_iowait: Time waiting for I/O to complete.
cpu_usage_irq: Time servicing interrupts.
cpu_usage_nice: Time running user-niced processes.
cpu_usage_softirq: Time servicing softirqs.
cpu_usage_steal: Time spent in other operating systems when running in a virtualized environment.
cpu_usage_system: Time spent running system processes.
cpu_usage_user: Time spent running user processes.
system_load1: System load average for the last minute.
system_load15: System load average for the last 15 minutes.
system_load5: System load average for the last 5 minutes.
system_n_cpus: Number of CPU cores available.
system_n_users: Number of users logged in.
system_uptime: Time for which the system has been up and running.

Disk space utilization#

disk_free: Amount of free disk space.
disk_inodes_free: Number of free inodes.
disk_inodes_total: Total number of inodes.
disk_inodes_used: Number of used inodes.
disk_total: Total disk space.
disk_used: Amount of used disk space.
disk_used_percent: Percentage of disk space used.

Disk input and output#

Metrics such as diskio_io_time, diskio_iops_in_progress, etc., offer valuable insights into disk I/O operations. These metrics encompass read/write operations, the duration of these operations, bytes read/written, and more.

diskio_io_time
diskio_iops_in_progress
diskio_merged_reads
diskio_merged_writes
diskio_read_bytes
diskio_read_time
diskio_reads
diskio_weighted_io_time
diskio_write_bytes
diskio_write_time
diskio_writes

Garbage collector `MXBean`#

Metrics associated with the java_lang_GarbageCollector provide insights into the JVM’s garbage collection process. These metrics encompass details such as the collection count, duration of collections, and more.

java_lang_GarbageCollector_G1_Young_Generation_CollectionCount: returns the total number of collections that have occurred
java_lang_GarbageCollector_G1_Young_Generation_CollectionTime: returns the approximate accumulated collection elapsed time in milliseconds
java_lang_GarbageCollector_G1_Young_Generation_duration

Memory usage#

Metrics starting with java_lang_Memory provide insights into the JVM’s memory usage, such as committed memory, initial memory, max memory, used memory, etc.

java_lang_Memory_committed: returns the amount of memory in bytes that is committed for the Java virtual machine to use
java_lang_Memory_init: returns the amount of memory in bytes that the Java virtual machine initially requests from the operating system for memory management
java_lang_Memory_max: returns the maximum amount of memory in bytes that can be used for memory management
java_lang_Memory_used: returns the amount of used memory in bytes
java_lang_Memory_ObjectPendingFinalizationCount

Apache Kafka Connect#

The Apache Kafka Connect metrics list is available in the dedicated page.

Apache Kafka broker#

The descriptions for the below metrics are available in the Monitoring section of the Apache Kafka documentation.

Note

The metrics with a _Count suffix are cumulative counters for the given metric, e.g. kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_Count.

Note that a metric like kafka_server_BrokerTopicMetrics_MessagesInPerSec_Count is a cumulative count of incoming messages despite the PerSec suffix in the metric name.

To see the rate of change of these _Count metrics, you can apply a function such as the rate() function in PromQL.

Apache Kafka controller#

Note

These metrics with kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_XthPercentile (where X can be 50th, 75th, 95th, etc.) represent the time taken for leader elections to complete at various percentiles. It helps in understanding the distribution of leader election times.

Metrics below with kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_ (FifteenMinuteRate, FiveMinuteRate, etc.) represent the rate of leader elections over different time intervals.

Metrics below with kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_ (Max/Mean/Min/StdDev) provide statistical measures about the leader election times.

Metrics below with kafka_controller_KafkaController_Metrics provide insights into the state of the Kafka controller, like the number of active brokers, offline partitions, replicas to delete, etc.

kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_50thPercentile
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_75thPercentile
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_95thPercentile
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_98thPercentile
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_999thPercentile
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_99thPercentile
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_Count: The total number of leader elections.
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_FifteenMinuteRate
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_FiveMinuteRate
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_Max
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_Mean
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_MeanRate
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_Min
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_OneMinuteRate
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs_StdDev
kafka_controller_ControllerStats_UncleanLeaderElectionsPerSec_Count: Number of times an unclean leader election occurs. Unclean leader elections can lead to data loss.
kafka_controller_KafkaController_ActiveBrokerCount_Value
kafka_controller_KafkaController_ActiveControllerCount_Value
kafka_controller_KafkaController_FencedBrokerCount_Value
kafka_controller_KafkaController_OfflinePartitionsCount_Value
kafka_controller_KafkaController_PreferredReplicaImbalanceCount_Value
kafka_controller_KafkaController_ReplicasIneligibleToDeleteCount_Value
kafka_controller_KafkaController_ReplicasToDeleteCount_Value
kafka_controller_KafkaController_TopicsIneligibleToDeleteCount_Value
kafka_controller_KafkaController_TopicsToDeleteCount_Value

`Jolokia` collector collect time#

kafka_jolokia_collector_collect_time: Represents the time taken by the Jolokia collector to collect metrics. Jolokia is a JMX-HTTP bridge, giving an alternative to native JMX access.

Apache Kafka log#

Note

Metrics like kafka_log_LogCleaner_cleaner_recopy_percent_Value and kafka_log_LogCleanerManager_time_since_last_run_ms_Value provide insights into the log cleaner’s operation, which helps in compacting the Kafka logs.

Log Flush Rate Metrics give insights into the log flush operations. Flushing ensures that data is written from memory to disk. Metrics like kafka_log_LogFlushStats_LogFlushRateAndTimeMs_XthPercentile provide the time taken to flush logs at various percentiles.

kafka_log_LogCleaner_cleaner_recopy_percent_Value
kafka_log_LogCleanerManager_time_since_last_run_ms_Value
kafka_log_LogCleaner_max_clean_time_secs_Value
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_50thPercentile
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_75thPercentile
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_95thPercentile
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_98thPercentile
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_999thPercentile
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_99thPercentile
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_Count
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_FifteenMinuteRate
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_FiveMinuteRate
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_Max
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_Mean
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_MeanRate
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_Min
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_OneMinuteRate
kafka_log_LogFlushStats_LogFlushRateAndTimeMs_StdDev
kafka_log_Log_LogEndOffset_Value
kafka_log_Log_LogStartOffset_Value
kafka_log_Log_Size_Value

Apache Kafka network#

Note

Metrics below like kafka_network_RequestMetrics_RequestsPerSec_Count and kafka_network_RequestMetrics_TotalTimeMs_Mean provide insights into the network requests made to the Kafka brokers.

kafka_network_RequestChannel_RequestQueueSize_Value
kafka_network_RequestChannel_ResponseQueueSize_Value
kafka_network_RequestMetrics_RequestsPerSec_Count
kafka_network_RequestMetrics_TotalTimeMs_95thPercentile
kafka_network_RequestMetrics_TotalTimeMs_Count
kafka_network_RequestMetrics_TotalTimeMs_Mean
kafka_network_SocketServer_NetworkProcessorAvgIdlePercent_Value

Apache Kafka server#

Note

The metrics below like BrokerTopicMetrics provide insights into various operations related to topics, like bytes in/out, failed fetch/produce requests, etc.

Metrics ReplicaManager like kafka_server_ReplicaManager_LeaderCount_Value provide insights into the state of replicas in the Kafka cluster.

If you do not specify the topic tag, it displays the combined rate for all topics as well as the rate for each individual topic. To view rates for specific topics, use the topic tag. To exclude the combined rate for all topics and only list metrics for individual topics, filter with topic!=""

kafka_server_BrokerTopicMetrics_BytesInPerSec_Count: Byte in (from the clients) rate per topic. Omitting ‘topic=(…)’ will yield the all-topic rate.
kafka_server_BrokerTopicMetrics_BytesOutPerSec_Count: Byte out (to the clients) rate per topic. Omitting ‘topic=(…)’ will yield the all-topic rate.
kafka_server_BrokerTopicMetrics_BytesRejectedPerSec_Count: Rejected byte rate per topic due to the record batch size being greater than max.message.bytes configuration. Omitting ‘topic=(…)’ will yield the all-topic rate.
kafka_server_BrokerTopicMetrics_FailedFetchRequestsPerSec_Count: Failed Fetch request (from clients or followers) rate per topic. Omitting ‘topic=(…)’ will yield the all-topic rate.
kafka_server_BrokerTopicMetrics_FailedProduceRequestsPerSec_Count: Failed Produce request rate per topic. Omitting ‘topic=(…)’ will yield the all-topic rate.
kafka_server_BrokerTopicMetrics_FetchMessageConversionsPerSec_Count: Message format conversion rate, for Produce or Fetch requests, per topic. Omitting ‘topic=(…)’ will yield the all-topic rate.
kafka_server_BrokerTopicMetrics_MessagesInPerSec_Count: Incoming message rate per topic. Omitting ‘topic=(…)’ will yield the all-topic rate.
kafka_server_BrokerTopicMetrics_ProduceMessageConversionsPerSec_Count: Message format conversion rate, for Produce or Fetch requests, per topic. Omitting ‘topic=(…)’ will yield the all-topic rate.
kafka_server_BrokerTopicMetrics_ReassignmentBytesInPerSec_Count: Incoming byte rate of reassignment traffic
kafka_server_BrokerTopicMetrics_ReassignmentBytesOutPerSec_Count: Outgoing byte rate of reassignment traffic
kafka_server_BrokerTopicMetrics_ReplicationBytesInPerSec_Count: Byte in (from the other brokers) rate per topic. Omitting ‘topic=(…)’ will yield the all-topic rate.
kafka_server_BrokerTopicMetrics_ReplicationBytesOutPerSec_Count: Byte out (to the other brokers) rate per topic. Omitting ‘topic=(…)’ will yield the all-topic rate.
kafka_server_BrokerTopicMetrics_TotalFetchRequestsPerSec_Count: Fetch request (from clients or followers) rate per topic. Omitting ‘topic=(…)’ will yield the all-topic rate.
kafka_server_BrokerTopicMetrics_TotalProduceRequestsPerSec_Count: Produce request rate per topic. Omitting ‘topic=(…)’ will yield the all-topic rate.
kafka_server_DelayedOperationPurgatory_NumDelayedOperations_Value
kafka_server_DelayedOperationPurgatory_PurgatorySize_Value
kafka_server_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_OneMinuteRate
kafka_server_KafkaServer_BrokerState_Value
kafka_server_ReplicaManager_IsrExpandsPerSec_Count
kafka_server_ReplicaManager_IsrShrinksPerSec_Count
kafka_server_ReplicaManager_LeaderCount_Value
kafka_server_ReplicaManager_PartitionCount_Value
kafka_server_ReplicaManager_UnderMinIsrPartitionCount_Value
kafka_server_ReplicaManager_UnderReplicatedPartitions_Value
kafka_server_group_coordinator_metrics_group_completed_rebalance_count
kafka_server_group_coordinator_metrics_group_completed_rebalance_rate
kafka_server_group_coordinator_metrics_offset_commit_count
kafka_server_group_coordinator_metrics_offset_commit_rate
kafka_server_group_coordinator_metrics_offset_deletion_count
kafka_server_group_coordinator_metrics_offset_deletion_rate
kafka_server_group_coordinator_metrics_offset_expiration_count
kafka_server_group_coordinator_metrics_offset_expiration_rate

Kernel#

Note

Metrics below, like kernel_boot_time, kernel_context_switches, etc., provide insights into the underlying system’s kernel operations.

kernel_boot_time
kernel_context_switches
kernel_entropy_avail
kernel_interrupts
kernel_processes_forked

Generic memory#

Note

Metrics like mem_active, mem_available, etc., provide insights into the system’s memory usage.

mem_active
mem_available
mem_available_percent
mem_buffered
mem_cached
mem_commit_limit
mem_committed_as
mem_dirty
mem_free
mem_high_free
mem_high_total
mem_huge_pages_free
mem_huge_page_size
mem_huge_pages_total
mem_inactive
mem_low_free
mem_low_total
mem_mapped
mem_page_tables
mem_shared
mem_slab
mem_swap_cached
mem_swap_free
mem_swap_total
mem_total
mem_used
mem_used_percent
mem_vmalloc_chunk
mem_vmalloc_total
mem_vmalloc_used
mem_wired
mem_write_back
mem_write_back_tmp

Network#

Note

Metrics like net_bytes_recv, net_packets_sent, etc., provide insights into the system’s network operations.

net_bytes_recv
net_bytes_sent
net_drop_in
net_drop_out
net_err_in
net_err_out
net_icmp_inaddrmaskreps
net_icmp_inaddrmasks
net_icmp_incsumerrors
net_icmp_indestunreachs
net_icmp_inechoreps
net_icmp_inechos
net_icmp_inerrors
net_icmp_inmsgs
net_icmp_inparmprobs
net_icmp_inredirects
net_icmp_insrcquenchs
net_icmp_intimeexcds
net_icmp_intimestampreps
net_icmp_intimestamps
net_icmpmsg_intype3
net_icmpmsg_intype8
net_icmpmsg_outtype0
net_icmpmsg_outtype3
net_icmp_outaddrmaskreps
net_icmp_outaddrmasks
net_icmp_outdestunreachs
net_icmp_outechoreps
net_icmp_outechos
net_icmp_outerrors
net_icmp_outmsgs
net_icmp_outparmprobs
net_icmp_outredirects
net_icmp_outsrcquenchs
net_icmp_outtimeexcds
net_icmp_outtimestampreps
net_icmp_outtimestamps
net_ip_defaultttl
net_ip_forwarding
net_ip_forwdatagrams
net_ip_fragcreates
net_ip_fragfails
net_ip_fragoks
net_ip_inaddrerrors
net_ip_indelivers
net_ip_indiscards
net_ip_inhdrerrors
net_ip_inreceives
net_ip_inunknownprotos
net_ip_outdiscards
net_ip_outnoroutes
net_ip_outrequests
net_ip_reasmfails
net_ip_reasmoks
net_ip_reasmreqds
net_ip_reasmtimeout
net_packets_recv
net_packets_sent
netstat_tcp_close
netstat_tcp_close_wait
netstat_tcp_closing
netstat_tcp_established
netstat_tcp_fin_wait1
netstat_tcp_fin_wait2
netstat_tcp_last_ack
netstat_tcp_listen
netstat_tcp_none
netstat_tcp_syn_recv
netstat_tcp_syn_sent
netstat_tcp_time_wait
netstat_udp_socket
net_tcp_activeopens
net_tcp_attemptfails
net_tcp_currestab
net_tcp_estabresets
net_tcp_incsumerrors
net_tcp_inerrs
net_tcp_insegs
net_tcp_maxconn
net_tcp_outrsts
net_tcp_outsegs
net_tcp_passiveopens
net_tcp_retranssegs
net_tcp_rtoalgorithm
net_tcp_rtomax
net_tcp_rtomin
net_udp_ignoredmulti
net_udp_incsumerrors
net_udp_indatagrams
net_udp_inerrors
net_udplite_ignoredmulti
net_udplite_incsumerrors
net_udplite_indatagrams
net_udplite_inerrors
net_udplite_noports
net_udplite_outdatagrams
net_udplite_rcvbuferrors
net_udplite_sndbuferrors
net_udp_noports
net_udp_outdatagrams
net_udp_rcvbuferrors
net_udp_sndbuferrors

Processes#

Note

Metrics like processes_running, processes_zombies, etc., provide insights into the system’s process management.

processes_blocked
processes_dead
processes_idle
processes_paging
processes_running
processes_sleeping
processes_stopped
processes_total
processes_total_threads
processes_unknown
processes_zombies

Swap usage#

Note

Metrics like swap_free, swap_used, etc., provide insights into the system’s swap memory usage.

swap_free
swap_in
swap_out
swap_total
swap_used
swap_used_percent

Aiven for Apache Kafka® metrics available via Prometheus#

CPU utilization#

Disk space utilization#

Disk input and output#

Garbage collector MXBean#

Memory usage#

Apache Kafka Connect#

Apache Kafka broker#

Apache Kafka controller#

Jolokia collector collect time#

Apache Kafka log#

Apache Kafka network#

Apache Kafka server#

Kernel#

Generic memory#

Network#

Processes#

Swap usage#

Garbage collector `MXBean`#

`Jolokia` collector collect time#