Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PrecompileTools to compile basic model on build. #2114

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

evetion
Copy link
Member

@evetion evetion commented Feb 28, 2025

Should help with latency (#1942) on our executables. The basic testmodel runs in half a second, and models related to basic see improvements too (but much less) with this PR.

This sets up a precompile workload using PrecompileTools, which also compiles the statements of other packages with our types, instead of just Ribasim. This also sets the specialization level of SciML to full, which is recommended for simulations.

However, as both will significantly increase compile time (and the model printing progress during it), I've disabled it by default for developers.

I believe that if we fix the type explosion of our Parameters we will have a fast executable for all models, although we ideally should have a workload that includes all our functionality (arrow, alloc, etc.).

@evetion
Copy link
Member Author

evetion commented Feb 28, 2025

Wow, this crashes hard on TC:

Windows

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x0 -- unknown function (ip: 0000000000000000)
in expression starting at C:\Users\svc-teamcity-ansible\.julia\packages\SimpleNonlinearSolve\1h0BO\src\SimpleNonlinearSolve.jl:143
unknown function (ip: 0000000000000000)
Allocations: 38904371 (Pool: 38903382; Big: 989); GC: 30
ERROR: failed process: Process(`'C:\Users\svc-teamcity-ansible\.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\julia.exe' --color=auto --startup-file=no --pkgimages=no '--cpu-target=generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)' '--sysimage=D:\buildAgent\temp\buildTmp\jl_nXpMnc\tmp_sys.dll' '--project=D:\buildAgent\work\ecd2b8f9b25b1609\ribasim\core' '--output-o=D:\buildAgent\temp\buildTmp\jl_7EHglq5Wd4-o.a' 'D:\buildAgent\temp\buildTmp\jl_HOzqJyl6d4'`, ProcessExited(1)) [1]

Linux

[1083399] signal 11 (1): Segmentation fault
in expression starting at /u/svc-teamcity-ansible/.julia/packages/SimpleNonlinearSolve/1h0BO/src/SimpleNonlinearSolve.jl:143
unknown function (ip: (nil))
Allocations: 38164904 (Pool: 38163932; Big: 972); GC: 29
ERROR: failed process: Process(`/u/svc-teamcity-ansible/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/bin/julia --color=auto --startup-file=no --pkgimages=no '--cpu-target=generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)' --sysimage=/opt/teamcityagent/temp/buildTmp/jl_J234qA/tmp_sys.so --project=/opt/teamcityagent/work/ecd2b8f9b25b1609/ribasim/core --output-o=/opt/teamcityagent/temp/buildTmp/jl_qFglC5m7en-o.a /opt/teamcityagent/temp/buildTmp/jl_QPqpwR412H`, ProcessSignaled(11)) [0]

I think that's the fullspecialize? Also note that the second run of the testmodel is not instant, if anything it's slower. @visr, could you build locally and see what happens?

@visr
Copy link
Member

visr commented Feb 28, 2025

I'll first finish the state Vector, which may help here. And if that doesn't help lets try disabling FullSpecialize.

Also discussed with @evetion that if all goes well we may be able to only use PrecompileTools and not PackageCompiler precompilation on top.

And if PrecompileTools speeds up other models as well we should always enable it.

@visr
Copy link
Member

visr commented Mar 3, 2025

Still failing at PackageCompiler creating a sysimage:

15:07:01   Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
15:07:01   Exception: EXCEPTION_ACCESS_VIOLATION at 0x0 -- unknown function (ip: 0000000000000000)
15:07:01   in expression starting at C:\Users\svc-teamcity-ansible\.julia\packages\SimpleNonlinearSolve\1h0BO\src\SimpleNonlinearSolve.jl:143
15:07:01   unknown function (ip: 0000000000000000)
15:07:01   Allocations: 37311783 (Pool: 37310847; Big: 936); GC: 29

This is possibly because with PrecompileTools it will compile more good leading to a sysimage that is too large?

The line that it is failing is also using PrecompileTools.
https://github.com/SciML/NonlinearSolve.jl/blob/v4.4.0/lib/SimpleNonlinearSolve/src/SimpleNonlinearSolve.jl#L143

Sounds like we need to put this on hold for a little longer.

@visr visr marked this pull request as draft March 3, 2025 20:33
@evetion
Copy link
Member Author

evetion commented Mar 4, 2025

On Linux this seems to work now (even though the build log doesn't look like it), as tested on our cluster:

(base) pronk_mn@v-slurmsub001 ~ $ time ./ribasim/ribasim basic/ribasim.toml
┌ Info: Starting a Ribasim simulation.
│   toml_path = "basic/ribasim.toml"
│   cli.ribasim_version = "2025.1.0"
│   starttime = 2020-01-01T00:00:00
└   endtime = 2021-01-01T00:00:00
┌ Warning: The following experimental features are enabled: concentration
└ @ Ribasim /opt/teamcityagent/work/ecd2b8f9b25b1609/ribasim/core/src/main.jl:42
Simulating 100%|████████████████████████████████████████████████████████████| Time: 0:00:00
┌ Info: Convergence bottlenecks in descending order of severity:
│   LinearResistance #12 = 8.149730746987561e-6
│   Basin #6 = 2.7041589660110494e-6
│   TabulatedRatingCurve #4 = 1.0239692169298505e-6
│   Basin #3 = 1.0238832862484456e-6
└   Basin #1 = 9.825234235602696e-7
[ Info: The model finished successfully.

real	0m1.987s
user	0m1.833s
sys	0m0.211s

@visr If you can confirm this for the Windows build, I'd say this is good to merge.

@evetion evetion marked this pull request as ready for review March 4, 2025 09:33
Copy link
Member

@visr visr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, I still have to verify the timings.

Comment on lines +5 to +10
# Commented out due to segfault on building
# set_preferences!(
# UUID("0bca4576-84f4-4d90-8ffe-ffa030f20462"), # SciMLBase
# "SpecializationLevel" => "FullSpecialize";
# force = true,
# )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Commented out due to segfault on building
# set_preferences!(
# UUID("0bca4576-84f4-4d90-8ffe-ffa030f20462"), # SciMLBase
# "SpecializationLevel" => "FullSpecialize";
# force = true,
# )

I made #2130.

Comment on lines +23 to +28
# Commented out due to segfault on building
# set_preferences!(
# UUID("0bca4576-84f4-4d90-8ffe-ffa030f20462"), # SciMLBase
# "SpecializationLevel" => missing;
# force = true,
# )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Commented out due to segfault on building
# set_preferences!(
# UUID("0bca4576-84f4-4d90-8ffe-ffa030f20462"), # SciMLBase
# "SpecializationLevel" => missing;
# force = true,
# )

toml_path = normpath(@__DIR__, "../../generated_testmodels/basic/ribasim.toml")
isfile(toml_path) || return
@compile_workload begin
Ribasim.main(toml_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ribasim.main(toml_path)
main(toml_path)

@visr
Copy link
Member

visr commented Mar 4, 2025

Comparing the CLI from this branch against main I don't see any speedup from this at all. The basic model is 1s for both, and the trivial model is 16s for both. Even though trivial is a subset of basic feature wise, hinting that perhaps #2127 won't help much at this point either.

That doesn't mean we shouldn't merge it though, hopefully this will start making a difference with further latency work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants