-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows support #41
Comments
I'm not sure that Jemalloc is necessary on Windows. Many years ago when we had support for Windows we found that default allocator was doing very well. You can simply run some Holesky validators with Jemalloc and with the default allocator and compare memory usage patterns. |
@sauliusgrigaitis thanks. You are right. As my testing, jemalloc-sys could be compiled with the correct building environment and some tweaks. But it is very verbose for users. Jemalloc is not well tested in Windows from the real world usage of jemalloc. Lighthouse just disable jemalloc for Windows. So, I think it is still better to just disable it for simplicity now. |
@sauliusgrigaitis Current PoC has been built successfully after my small metric cross platform abstraction. One notable change is in the workspace Cargo.toml's lints, I have unsafe code in new metric abstraction in that the windows crate just expose the related api in unsafe. I don't know if you mind this change, but currently, there's no particularly simple way for me to override lints. In fact, if you want to perform statistics like idle time on Windows, unsafe code is necessary. It's just a matter of whether you're using that unsafe by a library or in your own code. |
@mjzk did you try to run Holesky validators on Windows? Let's get back to code review after the entire functionality is confirmed. |
Not yet. I have 69 HolETHs, so I guess this is enough if the staking requirement in the Holesky testnet is the same to that of the mainnet's 32 ETH. In fact, I haven’t run a consensus client as a validator yet. My previous attempts with Lighthouse or our Grandine were only running as beacon nodes. I will probably give it a try tomorrow. How do we determine if a validator is working correctly? By relevant log output? Do you have any suggestion? @sauliusgrigaitis |
stackoverflow happen in ecdsa crate in the initialization in runtime. Need more investigations: stack trace: 7: once_cell::imp::impl$4::initialize::closure$0<array$<k256::arithmetic::mul::LookupTable,33>,once_cell::sync::impl$6::get_or_init::closure_env$0<array$<k256::arithmetic::mul::LookupTable,33>,once_cell::sync::impl$11::force::closure_env$0<array$<k256::arithmetic::mul::LookupTable,33>,array$<k256::arithmetic::mul::LookupTable,33> (*)()> >,enum2$<once_cell::sync::impl$6::get_or_init::Void> >
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\once_cell-1.19.0\src\imp_pl.rs:52
8: once_cell::imp::initialize_inner
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\once_cell-1.19.0\src\imp_pl.rs:146
9: once_cell::imp::OnceCell<array$<k256::arithmetic::mul::LookupTable,33> >::initialize<array$<k256::arithmetic::mul::LookupTable,33>,once_cell::sync::impl$6::get_or_init::closure_env$0<array$<k256::arithmetic::mul::LookupTable,33>,once_cell::sync::impl$11::force::closure_env$0<array$<k256::arithmetic::mul::LookupTable,33>,array$<k256::arithmetic::mul::LookupTable,33> (*)()> >,enum2$<once_cell::sync::impl$6::get_or_init::Vo
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\once_cell-1.19.0\src\imp_pl.rs:52
10: once_cell::sync::OnceCell<array$<k256::arithmetic::mul::LookupTable,33> >::get_or_try_init<array$<k256::arithmetic::mul::LookupTable,33>,once_cell::sync::impl$6::get_or_init::closure_env$0<array$<k256::arithmetic::mul::LookupTable,33>,once_cell::sync::impl$11::force::closure_env$0<array$<k256::arithmetic::mul::LookupTable,33>,array$<k256::arithmetic::mul::LookupTable,33> (*)()> >,enum2$<once_cell::sync::impl$6::get_or_in
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\once_cell-1.19.0\src\lib.rs:1161
11: once_cell::sync::OnceCell<array$<k256::arithmetic::mul::LookupTable,33> >::get_or_init<array$<k256::arithmetic::mul::LookupTable,33>,once_cell::sync::impl$11::force::closure_env$0<array$<k256::arithmetic::mul::LookupTable,33>,array$<k256::arithmetic::mul::LookupTable,33> (*)()> >
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\once_cell-1.19.0\src\lib.rs:1120
12: once_cell::sync::Lazy<array$<k256::arithmetic::mul::LookupTable,33>,array$<k256::arithmetic::mul::LookupTable,33> (*)()>::force<array$<k256::arithmetic::mul::LookupTable,33>,array$<k256::arithmetic::mul::LookupTable,33> (*)()>
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\once_cell-1.19.0\src\lib.rs:1313
13: once_cell::sync::impl$12::deref<array$<k256::arithmetic::mul::LookupTable,33>,array$<k256::arithmetic::mul::LookupTable,33> (*)()>
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\once_cell-1.19.0\src\lib.rs:1377
14: k256::arithmetic::mul::impl$6::mul_by_generator
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\k256-0.13.3\src\arithmetic\mul.rs:396
15: ecdsa::hazmat::sign_prehashed<k256::Secp256k1,k256::arithmetic::scalar::Scalar>
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ecdsa-0.16.9\src\hazmat.rs:245
16: k256::ecdsa::impl$1::try_sign_prehashed<k256::arithmetic::scalar::Scalar>
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\k256-0.13.3\src\ecdsa.rs:192
17: ecdsa::hazmat::SignPrimitive::try_sign_prehashed_rfc6979<k256::arithmetic::scalar::Scalar,k256::Secp256k1,digest::core_api::wrapper::CoreWrapper<digest::core_api::ct_variable::CtVariableCoreWrapper<sha2::core_api::Sha256VarCore,typenum::uint::UInt<typenum::uint::UInt<typenum::uint::UInt<typenum::uint::UInt<typenum::uint::UInt<typenum::uint::UInt<typenum::uint::UTerm,typenum::bit::B1>,typenum::bit::B0>,typenum::bit::B0>,t
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ecdsa-0.16.9\src\hazmat.rs:111
18: ecdsa::signing::impl$5::sign_prehash_with_rng<k256::Secp256k1,rand_core::os::OsRng>
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ecdsa-0.16.9\src\signing.rs:212
19: ecdsa::signing::impl$4::try_sign_digest_with_rng<k256::Secp256k1,digest::core_api::wrapper::CoreWrapper<sha3::Keccak256Core>,rand_core::os::OsRng>
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ecdsa-0.16.9\src\signing.rs:194
20: enr::keys::k256_key::impl$0::sign_v4
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\enr-0.10.0\src\keys\k256_key.rs:32
21: enr::keys::combined::impl$2::sign_v4
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\enr-0.10.0\src\keys\combined.rs:47
22: enr::builder::Builder<enum2$<enr::keys::combined::CombinedKey> >::signature<enum2$<enr::keys::combined::CombinedKey> >
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\enr-0.10.0\src\builder.rs:127
23: enr::builder::Builder<enum2$<enr::keys::combined::CombinedKey> >::build<enum2$<enr::keys::combined::CombinedKey> >
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\enr-0.10.0\src\builder.rs:164
24: eth2_libp2p::discovery::enr::build_enr
at C:\repo\grandine\eth2_libp2p\src\discovery\enr.rs:231
25: eth2_libp2p::discovery::enr::build_or_load_enr<types::preset::Mainnet>
at C:\repo\grandine\eth2_libp2p\src\discovery\enr.rs:138
26: eth2_libp2p::service::impl$0::new::async_fn$0<usize,types::preset::Mainnet>
at C:\repo\grandine\eth2_libp2p\src\service\mod.rs:169
27: core::future::future::impl$1::poll<alloc::boxed::Box<enum2$<eth2_libp2p::service::impl$0::new::async_fn_env$0<usize,types::preset::Mainnet> >,alloc::alloc::Global> >
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23\library\core\src\future\future.rs:123
28: p2p::network::impl$0::new::async_fn$0<types::preset::Mainnet>
at C:\repo\grandine\p2p\src\network.rs:164
29: runtime::runtime::run_after_genesis::async_fn$0<types::preset::Mainnet>
at C:\repo\grandine\runtime\src\runtime.rs:551
30: grandine::impl$0::run::async_fn$0<types::preset::Mainnet>
at C:\repo\grandine\grandine\src\main.rs:286
31: tokio::runtime::park::impl$4::block_on::closure$0<enum2$<grandine::impl$0::run::async_fn_env$0<types::preset::Mainnet> > >
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.38.0\src\runtime\park.rs:281
32: tokio::runtime::park::CachedParkThread::block_on<enum2$<grandine::impl$0::run::async_fn_env$0<types::preset::Mainnet> > >
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.38.0\src\runtime\park.rs:281
33: tokio::runtime::context::blocking::BlockingRegionGuard::block_on<enum2$<grandine::impl$0::run::async_fn_env$0<types::preset::Mainnet> > >
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.38.0\src\runtime\context\blocking.rs:66
34: tokio::runtime::scheduler::multi_thread::impl$0::block_on::closure$0<enum2$<grandine::impl$0::run::async_fn_env$0<types::preset::Mainnet> > >
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.38.0\src\runtime\scheduler\multi_thread\mod.rs:87
35: tokio::runtime::context::runtime::enter_runtime<tokio::runtime::scheduler::multi_thread::impl$0::block_on::closure_env$0<enum2$<grandine::impl$0::run::async_fn_env$0<types::preset::Mainnet> > >,enum2$<core::result::Result<tuple$<>,anyhow::Error> > >
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.38.0\src\runtime\context\runtime.rs:65
36: tokio::runtime::scheduler::multi_thread::MultiThread::block_on<enum2$<grandine::impl$0::run::async_fn_env$0<types::preset::Mainnet> > >
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.38.0\src\runtime\scheduler\multi_thread\mod.rs:89
37: tokio::runtime::runtime::Runtime::block_on<enum2$<grandine::impl$0::run::async_fn_env$0<types::preset::Mainnet> > >
at C:\Users\conta\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.38.0\src\runtime\runtime.rs:349
38: grandine::block_on<enum2$<grandine::impl$0::run::async_fn_env$0<types::preset::Mainnet> > >
at C:\repo\grandine\grandine\src\main.rs:774
39: grandine::impl$0::run_with_restart::closure$0<types::preset::Mainnet>
at C:\repo\grandine\grandine\src\main.rs:112
40: core::ops::function::FnOnce::call_once<grandine::impl$0::run_with_restart::closure_env$0<types::preset::Mainnet>,tuple$<> >
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23\library\core\src\ops\function.rs:250
41: core::panic::unwind_safe::impl$25::call_once<enum2$<core::result::Result<tuple$<>,anyhow::Error> >,grandine::impl$0::run_with_restart::closure_env$0<types::preset::Mainnet> >
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23\library\core\src\panic\unwind_safe.rs:273
42: std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<grandine::impl$0::run_with_restart::closure_env$0<types::preset::Mainnet> >,enum2$<core::result::Result<tuple$<>,anyhow::Error> > >
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23\library\std\src\panicking.rs:559
43: std::panicking::try::do_catch<core::panic::unwind_safe::AssertUnwindSafe<rayon_core::join::join_context::call_a::closure_env$0<rayon::iter::collect::consumer::CollectResult<bls::signature::Signature>,rayon::iter::plumbing::bridge_producer_consumer::helper::closure_env$0<rayon::slice::IterProducer<helper_functions::verifier::Triple>,rayon::iter::map::MapConsumer<rayon::iter::map::MapConsumer<rayon::iter::while_some::While
44: std::panicking::try<enum2$<core::result::Result<tuple$<>,anyhow::Error> >,core::panic::unwind_safe::AssertUnwindSafe<grandine::impl$0::run_with_restart::closure_env$0<types::preset::Mainnet> > >
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23\library\std\src\panicking.rs:523
45: std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<grandine::impl$0::run_with_restart::closure_env$0<types::preset::Mainnet> >,enum2$<core::result::Result<tuple$<>,anyhow::Error> > >
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23\library\std\src\panic.rs:149
46: grandine::Context::run_with_restart<types::preset::Mainnet>
at C:\repo\grandine\grandine\src\main.rs:109
47: grandine::try_main
at C:\repo\grandine\grandine\src\main.rs:517
48: grandine::main
at C:\repo\grandine\grandine\src\main.rs:311
49: core::ops::function::FnOnce::call_once<std::process::ExitCode (*)(),tuple$<> >
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23\library\core\src\ops\function.rs:250
50: std::sys_common::backtrace::__rust_begin_short_backtrace<std::process::ExitCode (*)(),std::process::ExitCode>
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23\library\std\src\sys_common\backtrace.rs:155
51: std::rt::lang_start::closure$0<std::process::ExitCode>
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23\library\std\src\rt.rs:159
52: std::rt::lang_start_internal
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library\std\src\rt.rs:141
53: std::rt::lang_start<std::process::ExitCode>
at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23\library\std\src\rt.rs:158
54: main
55: __scrt_common_main_seh
at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
56: BaseThreadInitThunk
57: RtlUserThreadStart |
Today, I spend more time on this problem. It is found the call from Grandine to our eth2_libp2p has this problem, but the bottom libraries like enr are just the Lighthouse maintained crates. The wired thing is Lighthouse and my standalone test code does not cause the stackoverflow. So, one guess is that, this problem may be related the project wide config in Grandine. More investigations needed. |
Do you build with |
Yes, BN works in release mode! This suggests that the compiler's optimization has resolved some stack-related system-level issues. If this is acceptable, we are close to completing this task. |
After a week of intensive work, Grandine is now able to run validation on Windows. My validator working status could be seen here. Some details:
Currently, there are still some issues with Overall, the progress on Windows has been good. However, there are many shared engineering flaws in Ethereum infrastructure when it comes to Rust projects, which is concerning. |
add some logs for test fixings tracking: failures:
---- spec_tests::invalid::basic_vector_uint16_4_consensus_spec_tests_tests_general_phase0_ssz_generic_basic_vector_invalid_vec_uint16_4_max_one_more stdout ----
thread 'spec_tests::invalid::basic_vector_uint16_4_consensus_spec_tests_tests_general_phase0_ssz_generic_basic_vector_invalid_vec_uint16_4_max_one_more' panicked at C:\repo\grandine\spec_test_utils\src\lib.rs:100:14:
the file should be compressed with Snappy: Offset { offset: 65316, dst_pos: 0 }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- spec_tests::valid::basic_vector_uint16_5_consensus_spec_tests_tests_general_phase0_ssz_generic_basic_vector_valid_vec_uint16_5_max stdout ----
thread 'spec_tests::valid::basic_vector_uint16_5_consensus_spec_tests_tests_general_phase0_ssz_generic_basic_vector_valid_vec_uint16_5_max' panicked at C:\repo\grandine\spec_test_utils\src\lib.rs:100:14:
the file should be compressed with Snappy: Offset { offset: 65316, dst_pos: 0 }
---- spec_tests::valid::basic_vector_uint16_5_consensus_spec_tests_tests_general_phase0_ssz_generic_basic_vector_valid_vec_uint16_5_random stdout ----
thread 'spec_tests::valid::basic_vector_uint16_5_consensus_spec_tests_tests_general_phase0_ssz_generic_basic_vector_valid_vec_uint16_5_random' panicked at C:\repo\grandine\spec_test_utils\src\lib.rs:100:14:
the file should be compressed with Snappy: Offset { offset: 20260, dst_pos: 0 }
failures:
spec_tests::invalid::basic_vector_uint16_4_consensus_spec_tests_tests_general_phase0_ssz_generic_basic_vector_invalid_vec_uint16_4_max_one_more
spec_tests::valid::basic_vector_uint16_5_consensus_spec_tests_tests_general_phase0_ssz_generic_basic_vector_valid_vec_uint16_5_max
spec_tests::valid::basic_vector_uint16_5_consensus_spec_tests_tests_general_phase0_ssz_generic_basic_vector_valid_vec_uint16_5_random
test result: FAILED. 1884 passed; 3 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.30s
error: test failed, to rerun pass `-p ssz --lib` |
add campanion PR in dedicated_executor. |
The problem has been located. The default config Action change will come soon. |
New CI action has passed. Detail of the new CI action:
|
PR #42 now is ready for review. |
@sauliusgrigaitis I can add a new step just for |
@sauliusgrigaitis the screensaver and sleep on Windows has been tested. Both works without problem.
FYI, before sleeping the Reth output like this, 2024-10-09T02:53:58.069057Z ERROR Invalid JWT: IAT (issued-at) claim is not within ±60 seconds from the current time
2024-10-09T02:54:00.106834Z ERROR Invalid JWT: IAT (issued-at) claim is not within ±60 seconds from the current time
2024-10-09T02:54:00.555525Z ERROR Invalid JWT: IAT (issued-at) claim is not within ±60 seconds from the current time
2024-10-09T02:54:00.617978Z ERROR Invalid JWT: IAT (issued-at) claim is not within ±60 seconds from the current time
2024-10-09T02:54:00.633123Z ERROR Invalid JWT: IAT (issued-at) claim is not within ±60 seconds from the current time Grandine just says like "error while downloading Eth1 blocks" |
This issue is tracking to all the problems and changes to support building Grandine on Windows.
More about this Project idea.
Tasks:
psutils
does not support WindowsThe text was updated successfully, but these errors were encountered: