Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EAMxx: Update Compy's config file to enable test-all-scream #6982

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

singhbalwinder
Copy link
Contributor

Compy's config is updated to run test-all-scream script on Compy.

@singhbalwinder singhbalwinder added BFB PR leaves answers BFB Compy EAMxx PRs focused on capabilities for EAMxx labels Feb 7, 2025
@singhbalwinder
Copy link
Contributor Author

I can run the physics tests using this setup, but I get undefined reference to BLAS and LAPACK specific routines in HOMME if I try to generate baselines. I think we should merge this PR and I will address the remaining issues in a follow-up PR.

@bartgol
Copy link
Contributor

bartgol commented Feb 10, 2025

I can run the physics tests using this setup, but I get undefined reference to BLAS and LAPACK specific routines in HOMME if I try to generate baselines. I think we should merge this PR and I will address the remaining issues in a follow-up PR.

Are you saying that test-all-scream runs fine but test-all-scream -g fails? That is bizarre...

@singhbalwinder
Copy link
Contributor Author

I can run test-all-scream with --config-only and then compile/run physics tests just fine. When I run

./test-all-scream -m <machine> -t sp --config-only -g -b <baseline_dir> -w <path where to create test dir>

I get the blas/lapack errors in the end.

@singhbalwinder
Copy link
Contributor Author

Unrelated to this PR: On Chrysalis, when I try to generate baselines with config-only option, it just hangs....

@bartgol
Copy link
Contributor

bartgol commented Feb 10, 2025

I can run test-all-scream with --config-only and then compile/run physics tests just fine. When I run

./test-all-scream -m <machine> -t sp --config-only -g -b <baseline_dir> -w <path where to create test dir>

I get the blas/lapack errors in the end.

Tbc, if you use --config-only, and later compile/run, you are not generating baselines. You will run the tests, but nothing will be copied to the baselines dir... --config-only makes test-all-scream return almost immediately, so the part that copies files over to the baselines dir never runs.

Edit: what I previously wrote is wrong. --config-only is only used in the test phase. If you run with -g, the test phase does not run (test-all-scream recently switched to follow create_test, in that you either create baselines or you cmp against them, not both). So in your second cmd, --config-only should be pointless. At the top, when you say you use --config-only and then build/run manually, I assume that was without -g?

@bartgol
Copy link
Contributor

bartgol commented Feb 10, 2025

@singhbalwinder another thing: test-all-scream only runs baselines tests if you add -b <baselinedir> (you can use AUTO as baselines dir). If -b is not used, all the code that depends on baselines is not compiled. I wonder if in your first run (where you build/run manually), you did not have -b AUTO, and hence did not enable the code that causes the link problem...

@singhbalwinder singhbalwinder marked this pull request as draft February 12, 2025 02:14
@singhbalwinder
Copy link
Contributor Author

Thanks, Luca for the suggestions.

At the top, when you say you use --config-only and then build/run manually, I assume that was without -g?

Yes, that is right.

I wonder if in your first run (where you build/run manually), you did not have -b AUTO, and hence did not enable the code that causes the link problem...

Yes, this is also true.

I googled and found out that the problem might be with the mkl version used by E3SM. I switched the mkl version to mkl/2020 and it worked fine for me.

I am changing this PR to draft as changing mkl version should be done in a separate PR and that PR should be merged first.

@bartgol
Copy link
Contributor

bartgol commented Feb 12, 2025

@singhbalwinder I closed the PR for the mkl change. In that PR, I suggested a fix to change mkl just for test-all-scream, which is what I think you wanted to do? If instead you want to update MKL for all use cases (including regular CIME runs), then please, reopen that PR, and we'll integrate it.

@singhbalwinder
Copy link
Contributor Author

Thanks, @bartgol for your suggestions . @rljacob has opened the other PR as nbfb only on Compy is not an issue. My other motivation to change mkl to 2020 was that mkl/2019u5 is buggy (see here under title "Intel Parallel Studio XE 2019 Update 5 MKL:"). Once that PR is in, I will make this PR available for more reviews and merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB Compy EAMxx PRs focused on capabilities for EAMxx
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants