-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition while registering nodes #2965
Labels
bug
Something isn't working
Comments
PR could be created (sorry, I do not have that sort of right, still a newbie here) using the following branch: https://github.com/jacob-netguardians/helix/tree/bugfix/2965-atomic-znode-creation-on-node-registration |
jacob-netguardians
added a commit
to jacob-netguardians/helix
that referenced
this issue
Nov 19, 2024
jacob-netguardians
added a commit
to jacob-netguardians/helix
that referenced
this issue
Nov 19, 2024
5 tasks
jacob-netguardians
added a commit
to jacob-netguardians/helix
that referenced
this issue
Dec 9, 2024
Reformatted changed file according to the "helix format"
jacob-netguardians
added a commit
to jacob-netguardians/helix
that referenced
this issue
Dec 11, 2024
This reverts commit 43ba0b0.
jacob-netguardians
added a commit
to jacob-netguardians/helix
that referenced
this issue
Dec 11, 2024
Reformatted changed file according to the "helix format"
jacob-netguardians
added a commit
to jacob-netguardians/helix
that referenced
this issue
Dec 11, 2024
Reformatted changed file according to the "helix format"
jacob-netguardians
added a commit
to jacob-netguardians/helix
that referenced
this issue
Dec 11, 2024
This reverts commit 2209337.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
When two nodes are registered very closely to eachother, it can happen that the ZKHelixAdmin.addInstance() method is called at about the same time on the two nodes. The one that comes slightly before the other one will start creating ZNodes for the "INSTANCE" hierarchy in Zookeeper.
Let's say the second node starts the run of the ZKHelixAdmin.addInstance() method now. Its call to findInstancesWithMatchingLogicalId() will fail badly, throwing an exception (see below), because the first node's INSTANCE hierarchy is not complete:
So... while adding node-2, it complains about node-1 being invalid, and consequently sets the "instance-operation" on node-2 to UNKNOWN instead of ENABLE.
Version of helix (1.4.1.1) above is due to the fact that I had already fixed #2963 and had a local version number change for this. Line numbers in the logs above may also not exactly match what is known in version 1.4.1, due to the addition of debug logging I had to do before actually understanding the problem.
Expected behavior
A node should be registered completely or not at all, not having that kind of impact on the fellow nodes' registration.
Additional context
I propose the following fix, in the ZKHelixAdmin class, creating all those nodes in the node's INSTANCE hierarchy in an atomic fashion:
Replace, in the ZKHelixAdmin class, addInstance() method, this:
with
where
createPersistentNodeOp
method is defined as:The text was updated successfully, but these errors were encountered: