-
-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: increase block production timeouts to account for event loop lag #7420
Conversation
*/ | ||
const BLOCK_PRODUCTION_RACE_CUTOFF_MS = 2_000; | ||
const BLOCK_PRODUCTION_RACE_CUTOFF_MS = 2_500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion increasing this by 500ms should be fine because the previous value was absolute, meaning no matter at which time we started the block production race it would always give 2 seconds while now it's most of the time just 1.5-1.7 seconds since the cutoff it based on seconds into slot.
We also have a strict timeout on builder header request now which means this cutoff should rarely be relevant.
In addition, this value was chosen when we used JSON to pass the block around vc<>bn which added at least 500ms more to publish flow compared to using SSZ + we no longer flood publish blocks and blobs which makes it even less likely to orphan blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass the block around vc<>bn which added at least 500ms more to publish flow compared to using SSZ
ok this makes sense to me. so lgtm
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #7420 +/- ##
=========================================
Coverage 50.26% 50.26%
=========================================
Files 602 602
Lines 40376 40376
Branches 2206 2206
=========================================
Hits 20293 20293
Misses 20043 20043
Partials 40 40 |
Performance Report🚀🚀 Significant benchmark improvement detected
Full benchmark results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
…#7420) **Motivation** Based on observations from mainnet nodes, it seems like we reject builder blocks in some cases either due to not being sent within cutoff time or due to timeout on the api call but looking at the response times these have been timely. The reason why those were rejected is either we started the block production race too late into the slot, which is mostly due to the fact that we take too much time to produce the common block body or the timeout was handled by the node with a delay, both of these cases are likely caused by event loop lag either due to GC or processing something else. See [discord](https://discord.com/channels/593655374469660673/1331991458152058991/1335576180815958088) for details. **Description** Increase block production timeouts to account for event loop lag
🎉 This PR is included in v1.26.0 🎉 |
Motivation
Based on observations from mainnet nodes, it seems like we reject builder blocks in some cases either due to not being sent within cutoff time or due to timeout on the api call but looking at the response times these have been timely. The reason why those were rejected is either we started the block production race too late into the slot, which is mostly due to the fact that we take too much time to produce the common block body or the timeout was handled by the node with a delay, both of these cases are likely caused by event loop lag either due to GC or processing something else.
See discord for details.
Description
Increase block production timeouts to account for event loop lag