Skip to content

[Bug]: Endless brokers pods restarts when changing min.insync.replicas with Eligible Leader Replicas (ELR) feature enabled #11685

@ppatierno

Description

@ppatierno

While testing the Apache Kafka 4.1.0 RCs, I found an interesting issue which was the cause of several regression tests failing.
With Apache Kafka 4.1.0, the Eligible Leader Replicas (ELR) feature is enabled (by default) and it doesn't allow to change the min.insync.replicas. See here: https://github.com/apache/kafka/blob/4.1/metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java#L383

It means that if the user tries to change the min.insync.replicas configuration parameter withing the spec.kafka.config, it causes the KafkaRoller entering in an endless loop of restarting brokers by logging something like this:

2025-07-23 12:41:35 ERROR KafkaRoller:742 - Reconciliation #46(watch) Kafka(myproject/my-cluster): Error updating broker configuration for pod my-cluster-broker-0/0
org.apache.kafka.common.errors.InvalidConfigurationException: Broker-level min.insync.replicas cannot be altered while ELR is enabled.
2025-07-23 12:41:35 INFO  KafkaRoller:468 - Reconciliation #46(watch) Kafka(myproject/my-cluster): Rolling Pod my-cluster-broker-0/0 due to []
2025-07-23 12:41:35 INFO  PodOperator:54 - Reconciliation #46(watch) Kafka(myproject/my-cluster): Rolling pod my-cluster-broker-0

The min.insync.replicas is dynamically configurable so the KafkaRoller tries to do so first but, because of the failure, it forces the rolling of the broker pod to apply the change. In any case, the change isn't really applied, due to ELR enabled, so it detects it again and again on next reconciliations, restarting brokers in an endless loop.

The same applies to Apache Kafka 4.0.0 if the ELR is enabled on purpose by the user via the kafka-features.sh tool. It's disable (by default) without causing issue.

bin/kafka-features.sh --bootstrap-server localhost:9092 upgrade --feature eligible.leader.replicas.version=1

Instead of considering this specific error as a ForceableProblem and restart the broker, the KafkaRoller should allow to track it with a warning condition in the Kafka custom resource status and avoid the rolling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions