Kafka rebalances during the rolling restart #1642

filimonov · 2025-02-19T14:58:59Z

If you have a lot of nodes which run a lot of Kafka tables the rolling restart can lead to a terrible sequence of the rebalances.

node1 going down
every kafka table on that node get shutting down
that triggers rebalances in consumer groups
rebalance is 'stop the world' thing in clickhouse
so all other replicas pauses the consumtion and start the relabalnce protocol to redestribute the topics / partitions
it usually takes seconds to dozen of seconds till they will get the new assignment
in the meanwhile the node1 get back online and trigger one more rebalance

then the situation repeats for other nodes.

Possible solution:

let's introduce some setting like stopSteamingTablesDuringRestarts
when enabled clickhouse-operator before restarting the first node should
do

DETACH TABLE db.table ON CLUSTER '{cluster}' PERMANENTLY

for every table with engine Kafka (maybe also RabbitMQ? and others)
and store in the state that the table were detached (wouldn't it be too much?)
after that do normal reconsile / restarts.

in cases or success / failure do ATTACH TABLE for every table stored in the state.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka rebalances during the rolling restart #1642

Kafka rebalances during the rolling restart #1642

filimonov commented Feb 19, 2025

Kafka rebalances during the rolling restart #1642

Kafka rebalances during the rolling restart #1642

Comments

filimonov commented Feb 19, 2025