Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the reliability of the refresh process due to the loss of slots #52

Merged
merged 2 commits into from
Oct 20, 2023

Conversation

ljfuyuan
Copy link

@ljfuyuan ljfuyuan commented Oct 15, 2023

When the cluster slots information has not yet been allocated, getNodeAddrs may return empty data, causing the masters and replicas information to be cleared and unable to be restored to the correct state.

In addition, we should regard the getNodeAddrs parameter preferReplicas, such as EachNode's replicas or getRandomConn's readonly, we cannot simply fill in the address of the replicas directly in getNodeAddrs.

cluster.go Outdated
@@ -106,7 +106,13 @@ func (c *Cluster) refresh(bg bool) error {
var errMsgs []string
var oldm, newm [HashSlots][]string

addrs, _ := c.getNodeAddrs(false)
// get all node addrs, including replicas
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this does not return all node addresses, it will return the replicas if it knows about any, otherwise the masters. I'm not sure I like switching the refresh to get cluster slots from replicas - it should be ok, but it might need a READONLY call to make the replica accept the command (I'm not sure since this is not serving a key, it's an administration command), we'd have to test that and it might differ based on redis version...

I'll come back to review it when I have a bit more time to get back in context, but I think we should address that in getNodeAddrs instead and reuse the StartupNodes only if both c.masters and c.replicas are empty, and here in refresh fall back to using the replicas only if getNodeAddrs(false) returns an empty list of addresses.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here in refresh fall back to using the replicas only if getNodeAddrs(false) returns an empty list of addresses.

And making sure the replicas do serve the CLUSTER SLOTS command without a READONLY request first.

Copy link
Author

@ljfuyuan ljfuyuan Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I made the mistake of getNodeAddrs(true) will not return all nodes.

And making sure the replicas do serve the CLUSTER SLOTS command without a READONLY request first

I did tests on redis 5.0.8 and 6.2.13 clusters. The replica node can correctly return cluster slots information without READONLY

So, based on the above, can we try to do this? At the same time, the semantics of preferReplicas can be preserved .

        addrs, _ := c.getNodeAddrs(false)
	if len(addrs) == 0 {
		if addrs, _ = c.getNodeAddrs(true); len(addrs) == 0 {
			addrs = c.StartupNodes
		}
	}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would be good.

@mna
Copy link
Owner

mna commented Oct 20, 2023

Thanks for your contribution!

@mna mna merged commit 70c449d into mna:master Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants