-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can redisc recover from master-slave node failover? #51
Comments
Hello, That's a great question, it does recover because it communicates with all nodes in the cluster when refreshing its slots, so it should eventually get back on its feet. I just updated the consistency checker program in https://github.com/mna/redisc/tree/master/ccheck, you can play with it and give it a shot, if you do find some scenarios where it does not properly recover I'm very interested in hearing about it to try to improve its resiliency! What I tested was to setup the cluster, start the I think that theoretically, if getting a connection from a pool fails repeatedly, it should force a refresh of the cluster mapping but I haven't heard any issues about something like that in the wild. Thanks, |
Yes, as you said, when the bind operation is called, Redisc will give priority to the previous connection to the master node. When a connection error is found, it will randomly select it again, giving it the opportunity to obtain the In addition, I encountered another problem. When my service was started before the redis cluster was ready, when the redis cluster was ready or maybe It should be noted that my test environment is docker, with 3 masters and 3 slaves are launched, and It seems that the easiest way to solve this problem is to modify the masters initialize in the function just like:
to:
so, when the above situation occurs, we can at least always use StartupNodes to restore the situation |
Thanks for the info, it's weird that |
Looking at Lines 408 to 417 in 5aef5f5
len(c.masters) == 0 , it should check if there are replicas and if so use them as masters, and if both are empty then fallback to the StartupNodes . This way we don't lose possibly valid cluster information if for some reason all masters failed and only replicas remain, and the next refresh will use the replicas addresses (now stored in c.masters ) to get the updated cluster slots.
And the |
I am glad to contribute to this project. Yes,as you mentioned,we need to consider the case where there are only |
Closed via #52 . |
Hello, Martin
I am a little confused about the failover between the primary and backup nodes. Redisc relies on the MOVED command to refresh, but when the primary node crashes, the backup node will eventually be promoted to the primary node, and Redisc still only communicates with the crashed primary node, which means It will never get a MOVED response and cannot be refreshed. The system will not be able to heal itself. Did I misunderstand something? Or is there a way to handle such situations and automatically perform failover?
The text was updated successfully, but these errors were encountered: