-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discovery: obtain channelMtx before doing any DB calls in handleChannelUpdate
#9573
base: master
Are you sure you want to change the base?
discovery: obtain channelMtx before doing any DB calls in handleChannelUpdate
#9573
Conversation
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
d16e005
to
c3792a3
Compare
95608c3
to
355e6db
Compare
discovery/gossiper_test.go
Outdated
@@ -381,7 +452,11 @@ func (r *mockGraphSource) IsStaleEdgePolicy(chanID lnwire.ShortChannelID, | |||
// NOTE: This method is part of the ChannelGraphSource interface. | |||
func (r *mockGraphSource) MarkEdgeLive(chanID lnwire.ShortChannelID) error { | |||
r.mu.Lock() | |||
defer r.mu.Unlock() | |||
defer func() { | |||
r.callCount["MarkEdgeLive"]++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative way to accomplish this would be to use mock.Mock
. Then you can just do mock.AssertNumberOfCalls(t, "MarkEdgeLive", 3)
. You can also set the expectations upfront as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it would be great if we can gradually replace all mockers using mock.Mock
, tho that would involve quite some refactor of the tests here. So maybe a PR at the end of the graph cleanup series?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe a PR at the end of the graph cleanup series?
Yeah sounds like a plan 🫡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I thought our rate limiter is per channel per peer, but it seems like it's per channel - does it mean an update from peer A can take up peer B's slot?
In addition this seems new,
--- FAIL: TestLightningNetworkDaemon/tranche04/142-of-271/bitcoind/relayer_blinded_error (134.58s)
harness_node.go:403: Starting node (name=Alice) with PID=12111
harness_node.go:403: Starting node (name=Bob) with PID=13149
harness_node.go:403: Starting node (name=Carol) with PID=14105
harness_node.go:403: Starting node (name=Dave) with PID=14327
harness_assertion.go:2331:
Error Trace: /home/runner/work/lnd/lnd/lntest/harness_assertion.go:2331
/home/runner/work/lnd/lnd/itest/lnd_route_blinding_test.go:682
/home/runner/work/lnd/lnd/lntest/harness.go:297
/home/runner/work/lnd/lnd/itest/lnd_test.go:130
Error: []routerrpc.HtlcEvent_EventType{3, 0} does not contain 1
Test: TestLightningNetworkDaemon/tranche04/142-of-271/bitcoind/relayer_blinded_error
Messages: wrong event type, got FORWARD%!(EXTRA routerrpc.HtlcEvent_EventType=SEND)
harness.go:375: finished test: relayer_blinded_error, start height=541, end height=548, mined blocks=7
harness.go:334: test failed, skipped cleanup
lnd_test.go:138: Failure time: 2025-03-03 15:15:52.959
discovery/gossiper_test.go
Outdated
@@ -381,7 +452,11 @@ func (r *mockGraphSource) IsStaleEdgePolicy(chanID lnwire.ShortChannelID, | |||
// NOTE: This method is part of the ChannelGraphSource interface. | |||
func (r *mockGraphSource) MarkEdgeLive(chanID lnwire.ShortChannelID) error { | |||
r.mu.Lock() | |||
defer r.mu.Unlock() | |||
defer func() { | |||
r.callCount["MarkEdgeLive"]++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it would be great if we can gradually replace all mockers using mock.Mock
, tho that would involve quite some refactor of the tests here. So maybe a PR at the end of the graph cleanup series?
It is per-channel-per-peer
Yeah saw that but just cant think that it is related to this PR, so assumed it was a rare flake. but will look at the logs a bit 👍 EDIT: damn, looks like the logs have expired/were not uploaded |
d447fe5
to
1a9f269
Compare
handleChannelUpdate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, very nice test demonstrating the bug! 🥇
This commit adds a test to demonstrate that if we receive two identical updates (which can happen if we get the same update from two peers in quick succession), then our rate limiting logic will be hit early as both updates might be counted towards the rate limit. This will be fixed in an upcoming commit.
In `handleChanUpdate`, make sure to grab the `channelMtx` lock before making any DB calls so that the logic remains consistent.
1a9f269
to
95277bb
Compare
~looking into the failing itest 👍 ~ nvm, looks like a known flake: lnd/itest/lnd_channel_graph_test.go Line 25 in 9feb761
|
That describes the flake found in updating channels and asserting the channel Looking into the logs, I found this error,
Not sure if it's related to the change here or just a flake, tho I think in general msgs should be sent in order, need to double check. |
There exists an edge-case where we receive the same ChannelUpdate from 2 different peers
in quick succession. Both will make it past the initial IsStale check since neither have been
persisted yet, this means that both updates will count towards the rate limit count for that peer:channel
combo.
To fix this, we need to re-check the staleness of the update once the lock is acquired.
This fixes the flake seen in this build: https://github.com/lightningnetwork/lnd/actions/runs/13583650550/job/37973971226