Skip to content

Cluster Unavailability During Node Removal #4260

@angelala00

Description

@angelala00

Hello M3DB Community,

I am encountering an issue with my M3DB cluster during a node removal operation. My cluster initially had 5 nodes, and I needed to scale down to 4 nodes. To do this, I used the following command:

curl -X DELETE <M3_COORDINATOR_HOST_NAME>:<M3_COORDINATOR_PORT(default 7201)>/api/v1/services/m3db/placement/<NODE_ID>
After executing this command, the cluster began rebalancing the shard data as expected. However, I faced an issue where the cluster became unavailable during this process. Here are some details:

Cluster Size Before Removal: 5 nodes
Node Removal Process: Using the above CURL command
Observed Issue: Cluster became unavailable during shard rebalancing
I followed the operational guidelines available at M3DB Operational Guide, but I am unsure what might have gone wrong. My expectation was that the cluster should remain available during a scale-down operation.

Could you please help me understand the following:

What are the common causes for a cluster becoming unavailable during a node removal process?
Are there any specific configurations or precautions that need to be taken to ensure cluster availability during such operations?
Is there any known issue or limitation with the version of M3DB that might affect the node removal process?
Any insights or guidance would be greatly appreciated. I am happy to provide more details if needed.

Thank you in advance for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions