-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
catchup: fetchAndWrite/fetchRound quit early on errNoBlockForRound #5809
Merged
algorandskiy
merged 13 commits into
algorand:master
from
algorandskiy:pavel/catchup-punish-errors
Nov 8, 2023
Merged
catchup: fetchAndWrite/fetchRound quit early on errNoBlockForRound #5809
algorandskiy
merged 13 commits into
algorand:master
from
algorandskiy:pavel/catchup-punish-errors
Nov 8, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cce
reviewed
Oct 27, 2023
algorandskiy
force-pushed
the
pavel/catchup-punish-errors
branch
from
October 28, 2023 02:41
514902e
to
827e74a
Compare
algorandskiy
changed the title
catchup: delay fetchAndWrite/fetchRound on errors
catchup: fetchAndWrite/fetchRound quit early on errNoBlockForRound
Oct 28, 2023
Codecov Report
@@ Coverage Diff @@
## master #5809 +/- ##
==========================================
- Coverage 55.64% 55.60% -0.05%
==========================================
Files 475 475
Lines 66869 66892 +23
==========================================
- Hits 37209 37193 -16
- Misses 27146 27178 +32
- Partials 2514 2521 +7
... and 7 files with indirect coverage changes 📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today! |
gmalouf
reviewed
Oct 30, 2023
gmalouf
reviewed
Oct 30, 2023
gmalouf
reviewed
Oct 30, 2023
gmalouf
reviewed
Oct 30, 2023
Co-authored-by: Gary <982483+gmalouf@users.noreply.github.com>
gmalouf
reviewed
Oct 30, 2023
gmalouf
reviewed
Nov 2, 2023
gmalouf
reviewed
Nov 2, 2023
algorandskiy
force-pushed
the
pavel/catchup-punish-errors
branch
from
November 3, 2023 00:02
8ee3c14
to
f70f0aa
Compare
zeldovich
approved these changes
Nov 7, 2023
gmalouf
approved these changes
Nov 8, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Checking this test failure (three nodes, one has 99%+ of the stake) I found frequent retries with after
errNoBlockForRound
could lead to punishing nodes. In particular in this test the relay node punished node1 and failed to catchup.Two functions are modified in slightly different way:
fetchAndWrite
simply checks forerrNoBlockForRoundThreshold
number of errors before giving up.fetchRound
sleeps and recreate the local peer selector. This flow is for a case when agreement saw the cert but not the block itself, and asking the block immediately could fail since the other side has not added it into the ledger yet. According to the method contract it must try indefinitely. To fix a rare situation with punishing one of two nodes more the peer selector gets recreated if too many errors.Additionally, modified how
peerRankDownloadFailed
is handled inrankPeer ->> historicStats.push
.peerRankNoBlockForRound
to handle "no block error" a bit differently: the penalty function isp*2^(0.1*x)
instead ofp*2^x
for this case to punish no block errors less aggressive.Test Plan
ReadyCheck
from TestBasicCatchup. The test is too fast now, from start to finish catchup on 3-10 blocks takes only 20 ms.ReadyCheck
still covered by TestReadyEndpoint.