PVF: drop backing jobs if it is too late #5616

AndreiEres · 2024-09-06T10:08:36Z

This PR introduces the removal of backing jobs that have been back pressured for longer than allowedAncestryLen, as these candidates are no longer viable.

It is reasonable to expect a result for a backing job execution within allowedAncestryLen blocks. Therefore, we set the job TTL as a relay block number and synchronize the validation host by sending activated leaves.

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

…e5530

Resolves #4632 The new logic optimizes the distribution of execution jobs for disputes, approvals, and backings. Testing shows improved finality lag and candidate checking times, especially under heavy network load. ### Approach This update adds prioritization to the PVF execution queue. The logic partially implements the suggestions from #4632 (comment). We use thresholds to determine how much a current priority can "steal" from lower ones: - Disputes: 70% - Approvals: 80% - Backing System Parachains: 100% - Backing: 100% A threshold indicates the portion of the current priority that can be allocated from lower priorities. For example: - Disputes take 70%, leaving 30% for approvals and all backings. - 80% of the remaining goes to approvals, which is 30% * 80% = 24% of the original 100%. - If we used parts of the original 100%, approvals couldn't take more than 24%, even if there are no disputes. Assuming a maximum of 12 executions per block, with a 6-second window, 2 CPU cores, and a 2-second run time, we get these distributions: - With disputes: 8 disputes, 3 approvals, 1 backing - Without disputes: 9 approvals, 3 backings It's worth noting that when there are no disputes, if there's only one backing job, we continue processing approvals regardless of their fulfillment status. ### Versi Testing 40/20 Testing showed a slight difference in finality lag and candidate checking time between this pull request and its base on the master branch. The more loaded the network, the greater the observed difference. Testing Parameters: - 40 validators (4 malicious) - 20 gluttons with 2 seconds of PVF execution time - 6 VRF modulo samples - 12 required approvals ![Pasted Graphic 3](https://github.com/user-attachments/assets/8b6163a4-a1c9-44c2-bdba-ce1ef4b1eba7) ![Pasted Graphic 4](https://github.com/user-attachments/assets/9f016647-7727-42e8-afe9-04f303e6c862) ### Versi Testing 80/40 For this test, we compared the master branch with the branch from #5616. The second branch is based on the current one but removes backing jobs that have exceeded their time limits. We excluded malicious nodes to reduce noise from disputing and banning validators. The results show that, under the same load, nodes experience less finality lag and reduced recovery and check time. Even parachains are functioning with a shorter block time, although it remains over 6 seconds. Testing Parameters: - 80 validators (0 malicious) - 40 gluttons with 2 seconds of PVF execution time - 6 VRF modulo samples - 30 required approvals ![image](https://github.com/user-attachments/assets/42bcc845-9115-4ae3-9910-286b77a60bbf) --------- Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

sandreim

LGTM! Using the block numbers and not timestamps is more robust and simple. However, it can still happen that the relay parent expires while the candidate is executed which is no better than using timestamps.

Added a few more suggestions to simplify the code.

sandreim · 2024-10-11T11:56:38Z

polkadot/node/core/candidate-validation/src/lib.rs

+	let Ok(mode) = prospective_parachains_mode(sender, relay_parent).await else {
+		return None;
+	};


We should assume that async backing is always enabled, if not log an error.

sandreim · 2024-10-11T12:01:33Z

polkadot/node/core/candidate-validation/src/lib.rs

@@ -512,6 +523,21 @@ where
 	Some(processed_code_hashes)
 }

+async fn maybe_update_active_leaf(
+	mut validation_backend: impl ValidationBackend,


In a future PR we can do even better. We keep track of last finalized block and then we can drop approval execution for the inclusion relay chain blocks that are already finalized.

polkadot/node/core/candidate-validation/src/lib.rs

polkadot/node/core/pvf/src/execute/queue.rs

s0me0ne-unkn0wn

Looks very good! 👍 Arguable question is whether we really need both PvfExecKind and Priority as their function is so close, but let it be like that for the matter of distinguishing their purpose.

sandreim

Looks good, thank you @AndreiEres.
Did we burn this in already on Kusama validators ?

sandreim · 2024-10-16T16:00:09Z

polkadot/node/core/backing/src/lib.rs

@@ -625,10 +625,18 @@ async fn request_candidate_validation(
 	candidate_receipt: CandidateReceipt,
 	pov: Arc<PoV>,
 	executor_params: ExecutorParams,
+	mode: ProspectiveParachainsMode,


We don't need to support sync backing mode. So we should always pass in the allowed_ancestry_len here,

AndreiEres and others added 30 commits June 19, 2024 16:49

Add execution priority

ff40792

Move priority to PendingExecutionRequest

fca9a11

Update candidate validation

7ec65f8

Add scheduling logic

0d0926a

Rename PreparePriority

f9bf225

Set low priority for backing

fc7f27a

Add additional option for disputes

9b8a5e9

Fix clippy

610f0cc

Fix types

507113c

Fix type

3fbecab

Update polkadot/node/core/pvf/src/priority.rs

9192602

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

Update

bafb6c2

Update logic

0c1edcf

Fix

7442c99

Fix

cb13533

Fix

4ec2e5e

Assign system backing

b70c247

Replace probability with counter

2dfbffc

Merge branch 'master' into AndreiEres/pvf-execution-priority

7e40ef6

Update iteration

98a4e64

Update description

5060a81

Add logging

7e071f6

Update

34b6363

Fix clippy warning

531629d

Merge branch 'master' into AndreiEres/pvf-execution-priority

2b05b64

Merge branch 'master' into AndreiEres/pvf-execution-priority

c2e791c

Update tests

59b74bf

Rename PreparePriority back

b261cc5

Fix docs

e7223fe

Update logic and explanations

d3d55d6

AndreiEres added 4 commits October 9, 2024 17:10

Fix minimal-example

314aa19

Merge branch 'AndreiEres/pvf-execution-priority' into AndreiEres/issu…

83c5f57

…e5530

Fix imports

40dbd8e

Merge branch 'AndreiEres/pvf-execution-priority' into AndreiEres/issu…

4249b24

…e5530

Base automatically changed from AndreiEres/pvf-execution-priority to master October 9, 2024 17:44

AndreiEres added 11 commits October 10, 2024 13:35

Merge branch 'master' into AndreiEres/issue5530

41f2cb9

Fix missing ttl

16d9b03

Fix PvfExecKind

f1cda91

Update prdoc

140268e

Fix fmt

1e163c6

Remove ttl from ValidateFromExhaustive

2873f7f

Use ttl as BlockNumber

d6e0630

Add active leaf to execute queue

b45c9e3

Remove leftovers

18e94c5

Fix error

bfe677d

Update docs

210a3dd

sandreim reviewed Oct 11, 2024

View reviewed changes

AndreiEres added 6 commits October 11, 2024 17:31

Add ttl to PvfExecKind

8f00e86

Rename update_best_block

ecfef63

Polish

b972d83

Fix errors

65a0c1e

Fix Errors

e86d71e

Fix errors

df2720c

AndreiEres requested review from sandreim, eskimor and alexggh October 15, 2024 12:54

s0me0ne-unkn0wn approved these changes Oct 16, 2024

View reviewed changes

sandreim approved these changes Oct 17, 2024

View reviewed changes

Address review comments

bd65553

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PVF: drop backing jobs if it is too late #5616

PVF: drop backing jobs if it is too late #5616

AndreiEres commented Sep 6, 2024 •

edited

Loading

sandreim left a comment

sandreim Oct 11, 2024

sandreim Oct 11, 2024

s0me0ne-unkn0wn left a comment

sandreim left a comment

sandreim Oct 16, 2024

PVF: drop backing jobs if it is too late #5616

Are you sure you want to change the base?

PVF: drop backing jobs if it is too late #5616

Conversation

AndreiEres commented Sep 6, 2024 • edited Loading

sandreim left a comment

Choose a reason for hiding this comment

sandreim Oct 11, 2024

Choose a reason for hiding this comment

sandreim Oct 11, 2024

Choose a reason for hiding this comment

s0me0ne-unkn0wn left a comment

Choose a reason for hiding this comment

sandreim left a comment

Choose a reason for hiding this comment

sandreim Oct 16, 2024

Choose a reason for hiding this comment

AndreiEres commented Sep 6, 2024 •

edited

Loading