-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: recursive RLock() mutex acquision #126
Conversation
There were at least two code paths which might acquire `.RLock()` recursively: 1. `Setup*()` -> `createResult()` 2. `Setup*()` -> `Networks()`. On its own, it's not a problem, but if `.Load()` is called concurrently, it might do `.Lock()` in between recursive `.RLock()`s, which would lead to a deadlock. See containerd#125 Fix by introducing a contract on the way mutex is acquired. Add a test facility to verify for such mistakes via build tags: * `go test -race` or `go test -tags deadlocks` activates deadlock detection Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Thanks for taking a look at this @smira ❤️ |
Fixes siderolabs/talos#9984 Patch with containerd/go-cni#126 See also: * containerd/go-cni#125 * containerd/containerd#11186 * containerd/go-cni#123 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Thanks for finding the bug, fixing is easy :) |
@mikebrow this was my original approach, I thought this was the correct way as I was concerned about a deadlock with the merged approach. |
Fixes siderolabs/talos#9984 Patch with containerd/go-cni#126 See also: * containerd/go-cni#125 * containerd/containerd#11186 * containerd/go-cni#123 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com> (cherry picked from commit 0b00e86)
"github.com/sasha-s/go-deadlock" | ||
) | ||
|
||
type RWMutex = deadlock.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an interface for sync.RWMutex
seems easier to manage and test explicitly for than using build tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well my point is you can have an interface in the struct and set it to use the deadlock.RWMutex in a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's possible, but build tags allows to exclude deadlock.RWMutex completely from the build if it's not race-enabled or has an explicit deadlocks
build tag.
plus interface indirection has a slight performance hit
this change is equivalent to what it was before if not using race/build tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm good either way.. not worried about performance of interface indirection for network device setup/tear down :-) Fair to change it in a separated PR if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the issue here is, who is running this test?
CI is not setup for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is setup for it, it runs go test -race
which is enough to activate deadlock library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I was doing a find in the ci.yml I see its in the actual Makefile.
the new contract Have not seen the failure path, exactly, yet, but this pattern seems fine and is easier to follow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It was in the failure path:
|
There were at least two code paths which might acquire
.RLock()
recursively:Setup*()
->createResult()
Setup*()
->Networks()
.On its own, it's not a problem, but if
.Load()
is called concurrently, it might do.Lock()
in between recursive.RLock()
s, which would lead to a deadlock.See #125
Fix by introducing a contract on the way mutex is acquired.
Add a test facility to verify for such mistakes via build tags:
go test -race
orgo test -tags deadlocks
activates deadlock detection