Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ParMETIS to local Dockerfile #1102

Merged
merged 4 commits into from
Feb 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 2 additions & 71 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,74 +7,5 @@ recommend that our users use Docker as the base running environment to use Graph
For users who want to create their own GraphStorm Docker images because they want to add additional functions,
e.g. graph data building, you can use the provided scripts to build your own GraphStorm Docker images.

## Prerequisites
-----------------
You need to install Docker in your environment as the [Docker documentation](https://docs.docker.com/get-docker/)
suggests.

For example, in an AWS EC2 instance created with Deep Learning AMI GPU PyTorch 1.13.0, you can run
the following commands to install Docker.
```shell
sudo apt-get update
sudo apt update
sudo apt install Docker.io
```

## Build a Docker image from source
---------------

Once you have the GraphStorm repository cloned, please use the following command to build a Docker image from source:
```shell
cd /path-to-graphstorm/docker/

bash /path-to-graphstorm/docker/build_docker_oss4local.sh /path-to-graphstorm/ image-name image-tag device
```

There are four arguments of the `build_docker_oss4local.sh`:

1. **path-to-graphstorm**(required), is the absolute path of the "graphstorm" folder, where you
cloned the GraphStorm source code. For example, the path could be "/code/graphstorm".
2. **docker-name**(optional), is the assigned name of the to be built Docker image. Default is
"graphstorm".
3. **docker-tag**(optional), is the assigned tag name of the to be built docker image. Default is
"local".
4. **device**(optional), is the intended execution device for the image. Should be one of `cpu` or `gpu`, default is
`gpu`.

If Docker requires you to run it as a root user and you don't want to preface all docker commands with sudo, you can check the solution available [here](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user).

You can use the below command to check if the new image exists.
```shell
docker image ls
```
If the build succeeds, there should be a new Docker image, named `<image-name>:<image-tag>-<device>`, e.g., "graphstorm:local-gpu".

To push the image to ECR you can use the `push_gsf_container.sh` script.
It takes 4 positional arguments, `image-name` `image-tag-device`, `region`, and `account`.
For example to push the local GPU image to the us-west-2 on AWS account `1234567890` use:

```bash
bash docker/push_gsf_container.sh graphstorm local-gpu us-west-2 1234567890
```

## Using a customer DGL codebase
---------------
To use a local DGL codebase, you'll need to modify the build script and Dockerfile.local.


You can add the following to the build_docker_oss4local.sh:

```bash
mkdir -p code/dgl
rsync -qr "${GSF_HOME}/../dgl/" code/dgl/ --exclude .venv --exclude dist --exclude ".*/" \
--exclude "*__pycache__" --exclude "third_party"
```

and in `local/Dockerfile.local` replace the line `RUN cd /root; git clone --branch v${DGL_VERSION} https://github.com/dmlc/dgl.git`
with the following lines:

```Dockerfile
COPY code/dgl /root/dgl
ENV PYTHONPATH="/root/dgl/python/:${PYTHONPATH}"
ENV LD_LIBRARY_PATH="/opt/gs-venv/lib/python3.9/site-packages/dgl/:$LD_LIBRARY_PATH"
```
For instructions refer to the
[GraphStorm documentation](https://graphstorm.readthedocs.io/en/latest/install/env-setup.html#setup-graphstorm-docker-environment)
11 changes: 9 additions & 2 deletions docker/build_docker_oss4local.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,13 @@ else
DEVICE_TYPE="$4"
fi

# process argument 5: support for parmetis
if [ -z "$5" ]; then
USE_PARMETIS="false"
else
USE_PARMETIS="$5"
fi

# Copy scripts and tools codes to the docker folder
mkdir -p $GSF_HOME"/docker/code"
cp $SCRIPT_DIR"/local/fetch_and_run.sh" $GSF_HOME"/docker/code/"
Expand All @@ -42,7 +49,6 @@ cp -r $GSF_HOME"/inference_scripts" $GSF_HOME"/docker/code/inference_scripts"
cp -r $GSF_HOME"/tools" $GSF_HOME"/docker/code/tools"
cp -r $GSF_HOME"/training_scripts" $GSF_HOME"/docker/code/training_scripts"


# Build OSS docker for EC2 instances that an pull ECR docker images
DOCKER_FULLNAME="${IMAGE_NAME}:${TAG}-${DEVICE_TYPE}"

Expand All @@ -55,7 +61,7 @@ elif [[ $DEVICE_TYPE = "cpu" ]]; then
docker login --username AWS --password-stdin public.ecr.aws
SOURCE_IMAGE="public.ecr.aws/ubuntu/ubuntu:22.04_stable"
else
echo >&2 -e "Image type can only be \"gpu\" or \"cpu\", but got \""$DEVICE_TYPE"\""
echo >&2 -e "Image type can only be \"gpu\" or \"cpu\", but got '$DEVICE_TYPE'"
# remove the temporary code folder
rm -rf code
exit 1
Expand All @@ -65,6 +71,7 @@ fi
DOCKER_BUILDKIT=1 docker build \
--build-arg DEVICE=$DEVICE_TYPE \
--build-arg SOURCE=${SOURCE_IMAGE} \
--build-arg USE_PARMETIS=${USE_PARMETIS} \
-f "${GSF_HOME}/docker/local/Dockerfile.local" . -t $DOCKER_FULLNAME

# remove the temporary code folder
Expand Down
17 changes: 13 additions & 4 deletions docker/build_graphstorm_image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@ Available options:
-d, --device Device type, must be one of 'cpu' or 'gpu'. Default is 'gpu'.
-p, --path Path to graphstorm root directory, default is one level above this script's location.
-i, --image Docker image name, default is 'graphstorm'.
-s, --suffix Suffix for the image tag, can be used to push custom image tags. Default is "<environment>-<device>".
-s, --suffix Suffix for the image tag, can be used to push custom image tags. Default tag is "<environment>-<device>".
-b, --build Docker build directory prefix, default is '/tmp/graphstorm-build/docker'.
--use-parmetis When this flag is set we add the ParMETIS dependencies to the local image. ParMETIS partitioning is not available on SageMaker.

Example:

Expand Down Expand Up @@ -49,6 +50,7 @@ parse_params() {
IMAGE_NAME='graphstorm'
BUILD_DIR='/tmp/graphstorm-build/docker'
SUFFIX=""
USE_PARMETIS=false

while :; do
case "${1-}" in
Expand Down Expand Up @@ -78,6 +80,9 @@ parse_params() {
SUFFIX="${2-}"
shift
;;
--use-parmetis)
USE_PARMETIS=true
;;
-?*) die "Unknown option: $1" ;;
*) break ;;
esac
Expand Down Expand Up @@ -113,6 +118,7 @@ msg "- DEVICE_TYPE: ${DEVICE_TYPE}"
msg "- GSF_HOME: ${GSF_HOME}"
msg "- IMAGE_NAME: ${IMAGE_NAME}"
msg "- SUFFIX: ${SUFFIX}"
msg "- USE_PARMETIS: ${USE_PARMETIS}"

# Prepare Docker build directory
if [[ -d ${BUILD_DIR} ]]; then
Expand All @@ -121,13 +127,15 @@ fi
mkdir -p "${BUILD_DIR}"

# Authenticate to ECR to be able to pull source SageMaker or public.ecr.aws image
msg "Authenticating to public ECR registry"
if [[ ${EXEC_ENV} == "sagemaker" ]]; then
# Pulling SageMaker image, login to public SageMaker ECR registry
if [[ ${USE_PARMETIS} == true ]]; then
die "ParMETIS partitioning is not supported for SageMaker execution environment"
fi
msg "Authenticating to public SageMaker ECR registry"
aws ecr get-login-password --region us-east-1 |
docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com
else
# Pulling local image, login to Amazon ECR Public Gallery
msg "Authenticating to Amazon ECR Public Gallery"
aws ecr-public get-login-password --region us-east-1 |
docker login --username AWS --password-stdin public.ecr.aws
fi
Expand Down Expand Up @@ -179,4 +187,5 @@ echo "Building Docker image: ${DOCKER_FULLNAME}"
DOCKER_BUILDKIT=1 docker build \
--build-arg DEVICE="$DEVICE_TYPE" \
--build-arg SOURCE="${SOURCE_IMAGE}" \
--build-arg USE_PARMETIS="${USE_PARMETIS}" \
-f "$DOCKERFILE" "${BUILD_DIR}" -t "$DOCKER_FULLNAME"
68 changes: 59 additions & 9 deletions docker/local/Dockerfile.local
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
ARG DEVICE=gpu
ARG USE_PARMETIS=false
ARG SOURCE

FROM ${SOURCE} as base
Expand Down Expand Up @@ -46,9 +47,17 @@ RUN pip install \
ARG DGL_VERSION=2.3.0
ARG DGL_CUDA_VERSION=121
ARG OGB_VERSION=1.3.6
ARG TORCH_VERSION=2.3
ARG TORCH_VERSION=2.3.0
ARG TRANSFORMERS_VERSION=4.28.1

# Download dgl files
RUN cd /root && \
git clone --branch v${DGL_VERSION} --single-branch https://github.com/dmlc/dgl.git && \
rm -rf /root/dgl/.git
ENV DGL_HOME=/root/dgl
ENV DGLBACKEND=pytorch
ENV PYTHONPATH="/root/dgl/tools/:${PYTHONPATH}"

FROM base as base-cpu

# Install torch, DGL, and GSF deps that require torch
Expand Down Expand Up @@ -78,18 +87,53 @@ RUN TORCH_MAJOR_MINOR=$(echo $TORCH_VERSION | cut -c1-3) && \
transformers==${TRANSFORMERS_VERSION} \
&& rm -rf /root/.cache

FROM base-${DEVICE} as runtime
FROM base-${DEVICE} as parmetis-true

ENV PYTHONPATH="/root/dgl/tools/:${PYTHONPATH}"
# Install MPI and dependencies
RUN apt update && apt install -y --no-install-recommends \
build-essential \
cmake \
libopenmpi-dev \
openmpi-bin \
&& rm -rf /var/lib/apt/lists/*

# Download DGL source code
RUN cd /root; git clone --branch v${DGL_VERSION} https://github.com/dmlc/dgl.git
RUN pip install \
pyyaml \
&& rm -rf /root/.cache

# Copy GraphStorm source and add to PYTHONPATH
RUN mkdir -p /graphstorm
COPY code/python/graphstorm /graphstorm/python/graphstorm
ENV PYTHONPATH="/graphstorm/python/:${PYTHONPATH}"
# Install GKLib
RUN cd /root && \
git clone --single-branch --branch master https://github.com/KarypisLab/GKlib && \
cd GKlib && \
make && \
make install && \
rm -rf .git

# Install Metis
RUN cd /root && \
git clone --single-branch --branch master https://github.com/KarypisLab/METIS.git && \
cd METIS && \
make config shared=1 cc=gcc prefix=/root/local i64=1 && \
make install && \
rm -rf .git

# Install Parmetis
RUN cd /root && \
git clone --single-branch --branch main https://github.com/KarypisLab/PM4GNN.git && \
cd PM4GNN && \
make config cc=mpicc prefix=/root/local && \
make install && \
rm -rf .git

ENV PATH=$PATH:/root/local/bin
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/local/lib/
RUN cp /root/local/bin/pm_dglpart /root/local/bin/pm_dglpart3

FROM base-${DEVICE} as parmetis-false

# No additional dependencies when not supporting ParMETIS

FROM parmetis-${USE_PARMETIS} as runtime

# Set up SSH access
ENV SSH_PORT=2222
Expand All @@ -104,6 +148,12 @@ RUN mkdir -p ${SSHDIR} \

EXPOSE ${SSH_PORT}

# Copy GraphStorm source and add to PYTHONPATH
RUN mkdir -p /graphstorm
COPY code/python/graphstorm /graphstorm/python/graphstorm
ENV PYTHONPATH="/graphstorm/python/:${PYTHONPATH}"


# Copy GraphStorm scripts and tools
COPY code/examples /graphstorm/examples
COPY code/inference_scripts /graphstorm/inference_scripts
Expand Down
16 changes: 13 additions & 3 deletions docs/source/install/env-setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ tag and other aspects of the build. We list the full argument list below:
* ``-i, --image`` Docker image name, default is 'graphstorm'.
* ``-s, --suffix`` Suffix for the image tag, can be used to push custom image tags. Default is "<environment>-<device>".
* ``-b, --build`` Docker build directory prefix, default is '/tmp/graphstorm-build/docker'.
* ``--use-parmetis`` When this flag is set we add the ParMETIS dependencies to the local image. ParMETIS partitioning is not available on SageMaker.

For example you can build an image to support CPU-only execution using:

Expand All @@ -183,6 +184,13 @@ For example you can build an image to support CPU-only execution using:
bash docker/build_graphstorm_image.sh --environment local --device cpu
# Will build an image named 'graphstorm:local-cpu'

Or to build and tag an image to run ParMETIS with EC2 instances:

.. code-block:: bash

bash docker/build_graphstorm_image.sh --environment local --device cpu --use-parmetis --suffix "-parmetis"
# Will build an image named 'graphstorm:local-cpu-parmetis'

See ``bash docker/build_graphstorm_image.sh --help``
for more information.

Expand Down Expand Up @@ -210,12 +218,14 @@ In addition to ``-e/--environment``, the script supports several optional argume
* ``-s, --suffix`` Suffix for the image tag, can be used to push custom image tags. Default is "<environment>-<device>".


Example:
Examples:

.. code-block:: bash

bash docker/push_graphstorm_image.sh -e local -r "us-east-1" -a "123456789012"
# Will push an image to '123456789012.dkr.ecr.us-east-1.amazonaws.com/graphstorm:local-gpu'
# Push an image to '123456789012.dkr.ecr.us-east-1.amazonaws.com/graphstorm:local-cpu'
bash docker/push_graphstorm_image.sh -e local -r "us-east-1" -a "123456789012" --device cpu
# Push a ParMETIS-capable image to '123456789012.dkr.ecr.us-east-1.amazonaws.com/graphstorm:local-cpu-parmetis'
bash docker/push_graphstorm_image.sh -e local -r "us-east-1" -a "123456789012" --device cpu --suffix "-parmetis"


Create a GraphStorm Container
Expand Down