Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apptainer definition file for PySR #687

Merged
merged 28 commits into from
Aug 12, 2024

Conversation

wkharold
Copy link
Contributor

fix: 686

This pull request adds an Apptainer definition file to PySR. Researchers can use it to build an Apptainer container that packages PySR and its dependencies (via a conda environment).

@MilesCranmer
Copy link
Owner

MilesCranmer commented Jul 31, 2024

Cool, thanks!

Since we have a Dockerfile already, could the Apptainer file be the same as that one, but in Apptainer syntax? I would prefer to avoid having two different images. See https://github.com/MilesCranmer/PySR/blob/master/Dockerfile

Also, would be nice to have some continuous integration tests like we have for the docker file.

@wkharold
Copy link
Contributor Author

Since we have a Dockerfile already, could the Apptainer file be the same as that one, but in Apptainer syntax?

I had some trouble with that approach initially so I went with conda. Happily, what I learned doing the conda version allowed me to switch back to pip3 and make it more like the Dockerfile. I'll work on adding an integration test that mimics the Docker one.

@MilesCranmer
Copy link
Owner

MilesCranmer commented Aug 1, 2024

Thanks, could it also pull from the python-bullseye image rather than rockylinux image? Since the base image is hosted by docker I don’t see a compelling reason for the base images to be different

@wkharold
Copy link
Contributor Author

wkharold commented Aug 1, 2024

Switched to bullseye for compatibility with the Dockerfile. I was using Rocky because that's what SchedMD uses as the base for the Slurm images built for Google's Cluster Toolkit (formerly HPC Toolkit). One my goals is to be able to bind mount the necessary Slurm commands and libs into a container so that you can use the Julia ClusterManagers interface to do distributed SR from within the container. Maybe your already know how to do that with Docker?

@coveralls
Copy link

coveralls commented Aug 1, 2024

Pull Request Test Coverage Report for Build 10345006057

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 93.724%

Totals Coverage Status
Change from base Build 10344709906: 0.0%
Covered Lines: 1135
Relevant Lines: 1211

💛 - Coveralls

@MilesCranmer
Copy link
Owner

Switched to bullseye for compatibility with the Dockerfile. I was using Rocky because that's what SchedMD uses as the base for the Slurm images built for Google's Cluster Toolkit (formerly HPC Toolkit). One my goals is to be able to bind mount the necessary Slurm commands and libs into a container so that you can use the Julia ClusterManagers interface to do distributed SR from within the container. Maybe your already know how to do that with Docker?

I see, thanks. This is a good use-case. Since I have no particular need to use bullseye, so maybe we could switch the defaults of both the Dockerfile and the Apptainer to use Rocky? You can change the default here:

ARG BASE_IMAGE=bullseye

@MilesCranmer
Copy link
Owner

MilesCranmer commented Aug 1, 2024

One my goals is to be able to bind mount the necessary Slurm commands and libs into a container so that you can use the Julia ClusterManagers interface to do distributed SR from within the container.

I do Slurm stuff with PySR, but haven't done it from a container before. In principle it should be fine, I just haven't tried yet.

One other thing that is needed is a function to install all packages, even the Julia extensions like for ClusterManagers.jl. I just made a PR #688 to add this function. So if your apptainer file calls pysr.load_all_packages(), all the Julia packages ever needed by PySR (including ClusterManagers.jl) will be installed into the read-only environment.

@MilesCranmer
Copy link
Owner

Ok I merged #688 which will help for your stuff, do you want to merge those changes into your branch?

@wkharold
Copy link
Contributor Author

wkharold commented Aug 1, 2024

I see, thanks. This is a good use-case. Since I have no particular need to use bullseye, so maybe we could switch the defaults of both the Dockerfile and the Apptainer to use Rocky? You can change the default here:

It doesn't look like there are Rocky based containers for Julia on Docker Hub. I think bullseye is OK for multicore work. Transitioning to Rocky would require significant changes and testing on Slurm to ensure that it worked properly.

@MilesCranmer
Copy link
Owner

Oh sorry I thought you meant you wanted to switch to Rocky. I'm happy with Bullseye too though. As long as both of the container types can have overall similar images.

Copy link
Owner

@MilesCranmer MilesCranmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For adding continuous integration testing, I would edit this file:

strategy:
matrix:
os: [ubuntu-latest]
arch: ['linux/amd64']
steps:
- uses: actions/checkout@v4
- name: Build docker
run: docker build --platform=${{ matrix.arch }} -t pysr .
- name: Test docker
run: docker run --platform=${{ matrix.arch }} --rm pysr /bin/bash -c 'pip install pytest nbval && python3 -m pysr test main,cli,startup'

and modify the build matrix to:

    strategy:
      matrix:
        os: [ubuntu-latest]
        arch: ['linux/amd64']
        container:
            - docker
            - apptainer

Then you can use

      if: ${{ matrix.container == "docker" }}

to run any docker-only steps, and similarly for apptainer. (I'm assuming there is a way to install apptainer in github actions.)

Call pysr.load_all_packages()
@wkharold
Copy link
Contributor Author

wkharold commented Aug 5, 2024

This is my first foray into Github CI testing. Would you prefer a modified CI_docker.yml file or a new CI_apptainer.yml file. My naive though is that the latter would be simpler but I could easily be missing something.

@wkharold
Copy link
Contributor Author

wkharold commented Aug 7, 2024

Scanning the test failures it looks like most/all of them are the result of the tests being run in a read-only directory. To avoid a version module not found error I cd into /pysr (which is in the container's read-only file system) before doing anything involving pysr. Is there a python environment variable I could set to avoid having to do that?

@MilesCranmer
Copy link
Owner

Is there a way you can mount the directory it uses for testing to a writeable directory?

@wkharold
Copy link
Contributor Author

wkharold commented Aug 7, 2024

Is the arguments section new on 1.3.0` maybe? My institute looks to have 1.1.9. Maybe we could have compatibility for 1.1.0 (or even 1.0.0) to have broader support?

It got introduced in 1.2 as part of the --build-arg / --build-arg-file feature. I don't have a problem with hardcoding the Julia/Python/distro bits to make it work with 1.x.

Lots of improvements have gone in since 1.1. You should encourage your institute sys admins to upgrade.

wkharold and others added 2 commits August 7, 2024 17:59
Put the Project stuff in /pysr/env

Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com>
@wkharold
Copy link
Contributor Author

wkharold commented Aug 7, 2024

I thought adding PYTHONPATH=/psyr had fixed the

ModuleNotFoundError: No module named 'pysr.version'

error. I no longer see it in my local builds or tests.

@MilesCranmer
Copy link
Owner

MilesCranmer commented Aug 8, 2024

Lots of improvements have gone in since 1.1. You should encourage your institute sys admins to upgrade.

Good point. I asked them and they are planning on it in the few month timescale. I guess there will always be some clusters using older versions, so we might as well make it compatible (within reason). Maybe 1.1 is a good lower limit for now (until someone makes a feature request for even earlier)

@MilesCranmer
Copy link
Owner

MilesCranmer commented Aug 8, 2024

The error about version.py missing indicates that it might not be getting installed to the directory we expect. Or maybe when we import pysr, it is importing from the local directory, rather than site-packages. Basically, version.py only gets created when one runs python setup.py install. (And it only gets put into the site-packages folder of PySR, rather than into the local directory)

Fix version.py not found issue
@wkharold
Copy link
Contributor Author

wkharold commented Aug 9, 2024

I looked in the container and version.py ends up in /pysr/pysr. I added that to PYTHONPATH but the test still fails. I've copied the command from the log and pasted it into my shell and it works. There must be something different about the github environment.

@MilesCranmer
Copy link
Owner

MilesCranmer commented Aug 9, 2024

The GitHub actions logs show:

Installing collected packages: fastjsonschema, tornado, rpds-py, pyzmq, psutil, pluggy, platformdirs, nest-asyncio, iniconfig, debugpy, coverage, comm, attrs, referencing, pytest, jupyter-core, jupyter-client, jsonschema-specifications, jsonschema, ipykernel, nbformat, nbval
  WARNING: The script debugpy is installed in '/home/runner/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts coverage, coverage-3.12 and coverage3 are installed in '/home/runner/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts py.test and pytest are installed in '/home/runner/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts jupyter, jupyter-migrate and jupyter-troubleshoot are installed in '/home/runner/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts jupyter-kernel, jupyter-kernelspec and jupyter-run are installed in '/home/runner/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script jsonschema is installed in '/home/runner/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script jupyter-trust is installed in '/home/runner/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

which is odd, I thought it would install to paths within the image? Or are there other paths that need to be set before the pip installs?

e.g., maybe it just copies the definition of HOME from the host environment?

@wkharold
Copy link
Contributor Author

wkharold commented Aug 9, 2024

Apptainer mounts the home directory of its initiator by default so those messages make sense but I didn't think about adding /home/runner/.local/bin to PATH. I'll add that and see if it changes things.

Relocate setting of testing env vars and add /home/runner/.local/bin to PATH
@MilesCranmer
Copy link
Owner

Oh I see. But shouldn’t it be installed inside the container rather than the user’s home directory (which thus wouldn’t be saved in the container)?

@MilesCranmer
Copy link
Owner

Maybe the Python pip install is trying to install to the home dir rather than an env?

@MilesCranmer
Copy link
Owner

Looks like we can find this with python -m site: https://stackoverflow.com/a/46071447

@wkharold
Copy link
Contributor Author

I looked a little closer at my local environment and realized that I don't run the tests in the PySR directory. When I did that I reproduced the failure we're seeing, ugh. So I added steps to create a new directory specifically for testing in the CI action. I also experimented with the Apptainer %test section of the definition file and it seemed to simplify things so added that and changed CI_apptainer.yml to use it.

@MilesCranmer MilesCranmer self-requested a review August 12, 2024 02:30
Copy link
Owner

@MilesCranmer MilesCranmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for accommodating all of these suggested changes and in general for the great contribution!!

@MilesCranmer MilesCranmer merged commit f363946 into MilesCranmer:master Aug 12, 2024
16 of 17 checks passed
@wkharold wkharold deleted the apptainer branch August 12, 2024 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants