Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVF: drop backing jobs if it is too late #5616

Open
wants to merge 100 commits into
base: master
Choose a base branch
from

Conversation

AndreiEres
Copy link
Contributor

@AndreiEres AndreiEres commented Sep 6, 2024

Fixes #5530

This PR introduces the removal of backing jobs that have been back pressured for longer than allowedAncestryLen, as these candidates are no longer viable.

It is reasonable to expect a result for a backing job execution within allowedAncestryLen blocks. Therefore, we set the job TTL as a relay block number and synchronize the validation host by sending activated leaves.

github-merge-queue bot pushed a commit that referenced this pull request Oct 9, 2024
Resolves #4632

The new logic optimizes the distribution of execution jobs for disputes,
approvals, and backings. Testing shows improved finality lag and
candidate checking times, especially under heavy network load.

### Approach

This update adds prioritization to the PVF execution queue. The logic
partially implements the suggestions from
#4632 (comment).

We use thresholds to determine how much a current priority can "steal"
from lower ones:
-  Disputes: 70%
-  Approvals: 80%
-  Backing System Parachains: 100%
-  Backing: 100%

A threshold indicates the portion of the current priority that can be
allocated from lower priorities.

For example:
-  Disputes take 70%, leaving 30% for approvals and all backings.
- 80% of the remaining goes to approvals, which is 30% * 80% = 24% of
the original 100%.
- If we used parts of the original 100%, approvals couldn't take more
than 24%, even if there are no disputes.

Assuming a maximum of 12 executions per block, with a 6-second window, 2
CPU cores, and a 2-second run time, we get these distributions:

-  With disputes: 8 disputes, 3 approvals, 1 backing
-  Without disputes: 9 approvals, 3 backings

It's worth noting that when there are no disputes, if there's only one
backing job, we continue processing approvals regardless of their
fulfillment status.

### Versi Testing 40/20

Testing showed a slight difference in finality lag and candidate
checking time between this pull request and its base on the master
branch. The more loaded the network, the greater the observed
difference.

Testing Parameters:
-  40 validators (4 malicious)
-  20 gluttons with 2 seconds of PVF execution time
-  6 VRF modulo samples
-  12 required approvals

![Pasted Graphic
3](https://github.com/user-attachments/assets/8b6163a4-a1c9-44c2-bdba-ce1ef4b1eba7)
![Pasted Graphic
4](https://github.com/user-attachments/assets/9f016647-7727-42e8-afe9-04f303e6c862)

### Versi Testing 80/40

For this test, we compared the master branch with the branch from
#5616. The second branch
is based on the current one but removes backing jobs that have exceeded
their time limits. We excluded malicious nodes to reduce noise from
disputing and banning validators. The results show that, under the same
load, nodes experience less finality lag and reduced recovery and check
time. Even parachains are functioning with a shorter block time,
although it remains over 6 seconds.

Testing Parameters:
-  80 validators (0 malicious)
-  40 gluttons with 2 seconds of PVF execution time
-  6 VRF modulo samples
-  30 required approvals


![image](https://github.com/user-attachments/assets/42bcc845-9115-4ae3-9910-286b77a60bbf)

---------

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
Base automatically changed from AndreiEres/pvf-execution-priority to master October 9, 2024 17:44
Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Using the block numbers and not timestamps is more robust and simple. However, it can still happen that the relay parent expires while the candidate is executed which is no better than using timestamps.

Added a few more suggestions to simplify the code.

Comment on lines 1116 to 1118
let Ok(mode) = prospective_parachains_mode(sender, relay_parent).await else {
return None;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should assume that async backing is always enabled, if not log an error.

@@ -512,6 +523,21 @@ where
Some(processed_code_hashes)
}

async fn maybe_update_active_leaf(
mut validation_backend: impl ValidationBackend,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future PR we can do even better. We keep track of last finalized block and then we can drop approval execution for the inclusion relay chain blocks that are already finalized.

polkadot/node/core/candidate-validation/src/lib.rs Outdated Show resolved Hide resolved
polkadot/node/core/pvf/src/execute/queue.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@s0me0ne-unkn0wn s0me0ne-unkn0wn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good! 👍 Arguable question is whether we really need both PvfExecKind and Priority as their function is so close, but let it be like that for the matter of distinguishing their purpose.

Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you @AndreiEres.
Did we burn this in already on Kusama validators ?

@@ -625,10 +625,18 @@ async fn request_candidate_validation(
candidate_receipt: CandidateReceipt,
pov: Arc<PoV>,
executor_params: ExecutorParams,
mode: ProspectiveParachainsMode,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to support sync backing mode. So we should always pass in the allowed_ancestry_len here,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I5-enhancement An additional feature request. T0-node This PR/Issue is related to the topic “node”. T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PVF: drop backing jobs if it is too late
7 participants