Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing out restart files in stretched-grid simulations #481

Open
mcdonh1718 opened this issue Mar 12, 2025 · 5 comments
Open

Writing out restart files in stretched-grid simulations #481

mcdonh1718 opened this issue Mar 12, 2025 · 5 comments
Assignees
Labels
category: Bug Something isn't working topic: Restart Files Related to GCHP restart files topic: Stretched Grid Specific to stretched grid simulation

Comments

@mcdonh1718
Copy link

Your name

Helena McDonald

Your affiliation

MIT

What happened? What did you expect to happen?

I'm running a stretched-grid simulation with base resolution C180 and a stretch factor of 30 (so approx 0.02 deg resolution) centered on Florida. Generating the initial restart file goes fine, though I did have to make sure I gave the longitude coordinates in [0,360] rather than [-180,180] space. When running a simulation, the input file works fine, but I can't string together simulations because the output restart file stores the longitude coordinate just slightly incorrectly, and so throws a 'factories not equal' error. From the initial restart file:

Image

From the simulation write out:
Image

This isn't challenging to fix myself by opening and editing the netcdf (and I can use the restart files after this edit), but it is irritating.

Similarly, I'm having issues where some simulations don't write out restart files at all; the model will be done generating collection files and the log file writes out the component time use breakdown, but it gets stuck converting the placeholder gcchem-internal-checkpoint file to a restart file. See logfile:

Image

It results in this array of errors in my slurm log, but never kills the run; my cluster kills it by timing out.

<img width="609" alt="Image" src="https://github.com/user-attachments/assets/a1d97073-836f-436c-a5c5-e9e2a72aea7b"

What are the steps to reproduce the bug?

Generated a restart file with --stretched-grid --stretch-factor 30 --target-latitude 28.56 --target-longitude 279.56. Enabled NEI2016_MONMEAN in HEMCO_Config. Ran a 3hr simulation starting 07 01 1200z using the restart file I generated using gridspec, ESMF.

Please attach any relevant configuration and log files.

config:
ExtData.txt
gchp-20190701_1200z-log.txt
HEMCO_Config.txt
HISTORY.txt
setCommonRunSettings.txt

slurm files:
gchp_run.txt
slurm-488693-out.txt

What GCHP version were you using?

14.3.1

What environment were you running GCHP on?

Local cluster

What compiler and version were you using?

gcc 9.4.0

What MPI library and version were you using?

OpenMPI 4.0.3

Will you be addressing this bug yourself?

No

Additional information

No response

@mcdonh1718 mcdonh1718 added the category: Bug Something isn't working label Mar 12, 2025
@lizziel lizziel self-assigned this Mar 17, 2025
@lizziel
Copy link
Contributor

lizziel commented Mar 18, 2025

Hi @mcdonh1718, this is a bug that was fixed in 14.4.0. You can either update versions or manually update your MAPL version by merge in main (let me know if you need help figuring out how to do this). You can see the fix in the linked PR on the issue report at #409.

@lizziel
Copy link
Contributor

lizziel commented Mar 18, 2025

Regarding the output restart file write issue, is there a pattern to when it takes a long time and hangs?

@lizziel lizziel added topic: Stretched Grid Specific to stretched grid simulation topic: Restart Files Related to GCHP restart files labels Mar 18, 2025
@mcdonh1718
Copy link
Author

Hi @mcdonh1718, this is a bug that was fixed in 14.4.0. You can either update versions or manually update your MAPL version by merge in main (let me know if you need help figuring out how to do this). You can see the fix in the linked PR on the issue report at #409.

yes, I'd like to try merging in the updated MAPL rather than installing a new gchp build. How do I do this?

@mcdonh1718
Copy link
Author

Regarding the output restart file write issue, is there a pattern to when it takes a long time and hangs?

I figured this out, actually; it was a storage issue. At that stretch factor/resolution, restart files and the related checkpoint files can be 30GB+ and the Restarts directory was running out of space, so it would refuse to write. By softlinking to another directory with more storage space and changing the write-out directions in setrestartlink.sh and my run script, I avoided the issue entirely. Or at least, it's been fine so far!

@lizziel
Copy link
Contributor

lizziel commented Mar 27, 2025

Great, I am glad you figured that out. Yes, high resolution restart files are very large since there are so many species in GEOS-Chem. We save out all species, not just advected species.

Regarding MAPL, you can go to your source code directory, then change directories to src/MAPL. Run git fetch -p which gives access to the latest updates on GitHub. Then do git checkout gchp/main. This will give you the latest MAPL used in GCHP, which is backwards compatible to version 14.3.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working topic: Restart Files Related to GCHP restart files topic: Stretched Grid Specific to stretched grid simulation
Projects
None yet
Development

No branches or pull requests

2 participants