chain-spec building failure due to WASM allocation size #5419

gpestana · 2024-08-20T14:06:44Z

I'm trying to build a chain-spec with a large number of nominators and validators for the multi-block election tests but given the amount of stakers to add, the chain-spec generation fails with:

❯ ~/cargo_target/debug/staking-node build-spec --disable-default-bootnode > chain-specs/staking-max.spec --raw
2024-08-20 14:58:04 Building chain spec
2024-08-20 14:58:07 going to fail due to allocating 54354570
Error: Service(Other("wasm call error Requested allocation size is too large"))

Is there any way to increase the WASM allocation size at the spec building time?

This issue seems to be similar to paritytech/substrate#11132 but at the time of chain-spec building, not block production.

The text was updated successfully, but these errors were encountered:

michalkucharczyk · 2024-08-26T07:33:36Z

Could you try this? Seems that this error is reported when the allocation exceeds this value:

polkadot-sdk/substrate/primitives/core/src/lib.rs

Lines 408 to 411 in 178e699

    
           /// The maximum number of bytes that can be allocated at one time. 
        
           // The maximum possible allocation size was chosen rather arbitrary, 32 MiB should be enough for 
        
           // everybody. 
        
           pub const MAX_POSSIBLE_ALLOCATION: u32 = 33554432; // 2^25 bytes, 32 MiB

michalkucharczyk · 2024-08-26T07:36:11Z

from what I know It is not possible to change this value during chain spec building.

gpestana · 2024-08-27T15:57:28Z

@michalkucharczyk thanks for the tip, it also requires a few other changes but in the end I got it working. The solution for this has been posted in the stackexchange for the record https://substrate.stackexchange.com/questions/11863/allocate-extra-wasm-memory-to-generate-large-chainspecs/11864

michalkucharczyk · 2024-08-27T18:42:33Z

Thank you for writing this down.

kianenigma · 2024-09-03T11:02:35Z

Should we address this at a more fundamental level? We can't expect everyone to change the code.

All WASM memory limits should ideall be removed in chain spec generation and OCW code path.

bkchr · 2024-09-03T19:00:36Z

What you want is this: polkadot-fellows/RFCs#4 and then to move the allocator inside of the runtime.

gpestana · 2024-09-10T12:39:43Z

@bkchr we've been discussing about increasing the max number of validators that can be registered in the system and be part of the snapshot. From a few experiments locally, the offchain miner panics with failure to allocating memory due to the current limits when calculating the election. I wonder if we could double the MAX_POSSIBLE_ALLOCATION to 64MiB and increase the max number of mem pages in the current hardcoded limits.

I'm now trying to improve the code so require less memory allocation, in any case, if we need to increase the limits, are there any thing to consider/side effects from increasing those limits?

bkchr · 2024-09-12T12:49:19Z

We can not just bump these limits. This changes the behavior of the allocator. If possible, I would not like to touch this at all before we move this to the runtime.

I think I proposed this already somewhere, but can we for now just not change the way we register them at genesis? Either pass them with two fields into the runtime or use append to add them.

@gpestana do you know where exactly the allocation is failing?

kianenigma · 2024-09-12T16:31:47Z

I think I proposed this already somewhere, but can we for now just not change the way we register them at genesis? Either pass them with two fields into the runtime or use append to add them.

The current issue is about OCW code path actually, no longer genesis, but @gpestana knows better and can point out exactly where we run OOM.

It is kind of silly to have the OCW/Genesis code path, which are by no means "consensus-critical" be subject to the memory limits that are afaik only arguable in consensus code paths. I hope we can hack around it.

In the old substrate days, I made a PR that never got merged, but it did something like this:

On the client side, we attached a enum Execution { Consensus, Offchain } to every wasm invocation. If Execution::Offchain, then these memory limits where a lot higher, or non existent. Would something like that make senese?

bkchr · 2024-09-12T20:17:06Z

On the client side, we attached a enum Execution { Consensus, Offchain } to every wasm invocation. If Execution::Offchain, then these memory limits where a lot higher, or non existent. Would something like that make senese?

We have this, but even in the old days you did not modify these constants.

gpestana · 2024-09-18T22:53:32Z

As a way to be more lenient wrt to memory constrains in the offchain execution mode, I'm parameterising the const MAX_ALLOCATING_MEMORY. This would unlock a couple of things: 1. chainspec generation won't fail for large states (e.g. init the chain with a staking state similar to currently in Polkadot) and to be more lenient on offchain staking-miners and other offchain workloads that can hit the constant memory limits imposed currently by the client.

My current direction is to expose those params in the CLI and overwrite the constants if the call context is of type CallContext::Offchain. The new parameters could be piggybacked into the struct HeapAllocStrategy, which is already taken in by the WasmExecutor::with_instance.

wdyt about this approach?

gpestana · 2024-10-16T13:08:23Z

PR to address this #5419

bkchr · 2024-10-16T20:07:38Z

It is kind of silly to have the OCW/Genesis code path, which are by no means "consensus-critical" be subject to the memory limits that are afaik only arguable in consensus code paths. I hope we can hack around it.

Just because you don't know how the allocator works, doesn't mean that the current way is "silly". As I said, you can not just change these constants without knowing what they are doing or how the allocator works.

When you change the allocator as done in #6081, it means the moment there is a two gigabyte allocation, this memory is never "freed" again. You will only be able to reuse this space. Maybe you are lucky and what you are doing works, but I clearly don't want to depend on luck for this kind of change. But for example for a Vec it allocates probably in multiple of two steps and for a 2 gigabyte vector. This means that the total number of allocations is reduced quite a lot.

What you want is this: polkadot-fellows/RFCs#4 and then to move the allocator inside of the runtime.

What I said here is the solution. It would be a better use of time to try to push this forward, instead of trying to come up with hacks that we can not accept.

gpestana added the T0-node This PR/Issue is related to the topic “node”. label Aug 20, 2024

gpestana closed this as completed Aug 27, 2024

kianenigma reopened this Sep 3, 2024

michalkucharczyk mentioned this issue Sep 17, 2024

Can't run the node after updating to polkadot-v1.8.0 #5744

Closed

skunert mentioned this issue Oct 1, 2024

Add overhead benchmark to frame-omni-bencher #5891

Draft

2 tasks

michalkucharczyk mentioned this issue Oct 15, 2024

[DNM] Add offchain executor params #5278

Closed

gpestana linked a pull request Oct 15, 2024 that will close this issue

Changes client memory allocation limits when executing in the offchain context #6081

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chain-spec building failure due to WASM allocation size #5419

chain-spec building failure due to WASM allocation size #5419

gpestana commented Aug 20, 2024

michalkucharczyk commented Aug 26, 2024 •

edited

Loading

michalkucharczyk commented Aug 26, 2024 •

edited

Loading

gpestana commented Aug 27, 2024

michalkucharczyk commented Aug 27, 2024

kianenigma commented Sep 3, 2024

bkchr commented Sep 3, 2024

gpestana commented Sep 10, 2024

bkchr commented Sep 12, 2024

kianenigma commented Sep 12, 2024

bkchr commented Sep 12, 2024

gpestana commented Sep 18, 2024

gpestana commented Oct 16, 2024

bkchr commented Oct 16, 2024 •

edited

Loading

chain-spec building failure due to WASM allocation size #5419

chain-spec building failure due to WASM allocation size #5419

Comments

gpestana commented Aug 20, 2024

michalkucharczyk commented Aug 26, 2024 • edited Loading

michalkucharczyk commented Aug 26, 2024 • edited Loading

gpestana commented Aug 27, 2024

michalkucharczyk commented Aug 27, 2024

kianenigma commented Sep 3, 2024

bkchr commented Sep 3, 2024

gpestana commented Sep 10, 2024

bkchr commented Sep 12, 2024

kianenigma commented Sep 12, 2024

bkchr commented Sep 12, 2024

gpestana commented Sep 18, 2024

gpestana commented Oct 16, 2024

bkchr commented Oct 16, 2024 • edited Loading

michalkucharczyk commented Aug 26, 2024 •

edited

Loading

michalkucharczyk commented Aug 26, 2024 •

edited

Loading

bkchr commented Oct 16, 2024 •

edited

Loading