Cluster Unavailability During Node Removal

Hello M3DB Community,

I am encountering an issue with my M3DB cluster during a node removal operation. My cluster initially had 5 nodes, and I needed to scale down to 4 nodes. To do this, I used the following command:

curl -X DELETE <M3_COORDINATOR_HOST_NAME>:<M3_COORDINATOR_PORT(default 7201)>/api/v1/services/m3db/placement/<NODE_ID>
After executing this command, the cluster began rebalancing the shard data as expected. However, I faced an issue where the cluster became unavailable during this process. Here are some details:

Cluster Size Before Removal: 5 nodes
Node Removal Process: Using the above CURL command
Observed Issue: Cluster became unavailable during shard rebalancing
I followed the operational guidelines available at M3DB Operational Guide, but I am unsure what might have gone wrong. My expectation was that the cluster should remain available during a scale-down operation.

Could you please help me understand the following:

What are the common causes for a cluster becoming unavailable during a node removal process?
Are there any specific configurations or precautions that need to be taken to ensure cluster availability during such operations?
Is there any known issue or limitation with the version of M3DB that might affect the node removal process?
Any insights or guidance would be greatly appreciated. I am happy to provide more details if needed.

Thank you in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cluster Unavailability During Node Removal #4260

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cluster Unavailability During Node Removal #4260

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions