-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
While testing the Apache Kafka 4.1.0 RCs, I found an interesting issue which was the cause of several regression tests failing.
With Apache Kafka 4.1.0, the Eligible Leader Replicas (ELR) feature is enabled (by default) and it doesn't allow to change the min.insync.replicas
. See here: https://github.com/apache/kafka/blob/4.1/metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java#L383
It means that if the user tries to change the min.insync.replicas
configuration parameter withing the spec.kafka.config
, it causes the KafkaRoller entering in an endless loop of restarting brokers by logging something like this:
2025-07-23 12:41:35 ERROR KafkaRoller:742 - Reconciliation #46(watch) Kafka(myproject/my-cluster): Error updating broker configuration for pod my-cluster-broker-0/0
org.apache.kafka.common.errors.InvalidConfigurationException: Broker-level min.insync.replicas cannot be altered while ELR is enabled.
2025-07-23 12:41:35 INFO KafkaRoller:468 - Reconciliation #46(watch) Kafka(myproject/my-cluster): Rolling Pod my-cluster-broker-0/0 due to []
2025-07-23 12:41:35 INFO PodOperator:54 - Reconciliation #46(watch) Kafka(myproject/my-cluster): Rolling pod my-cluster-broker-0
The min.insync.replicas
is dynamically configurable so the KafkaRoller tries to do so first but, because of the failure, it forces the rolling of the broker pod to apply the change. In any case, the change isn't really applied, due to ELR enabled, so it detects it again and again on next reconciliations, restarting brokers in an endless loop.
The same applies to Apache Kafka 4.0.0 if the ELR is enabled on purpose by the user via the kafka-features.sh
tool. It's disable (by default) without causing issue.
bin/kafka-features.sh --bootstrap-server localhost:9092 upgrade --feature eligible.leader.replicas.version=1
Instead of considering this specific error as a ForceableProblem
and restart the broker, the KafkaRoller should allow to track it with a warning condition in the Kafka
custom resource status and avoid the rolling.