Skip to content

Commit

Permalink
Add Arm64 builds to CI (#2093)
Browse files Browse the repository at this point in the history
* Add Arm64 build and test stages to CI
* Refactor CI docker images as a multi-arch container
* Add Arm64 Conda package output
* Add Arm64 (`aarch64`) to `dependencies.yaml` matrix
* Skip DOCA builds for arm (#2092).
* Skip tests for ARM conda builds, this avoids the need for an ARM GPU runner.
* The following packages are x86_64 specific: include-what-you-use (later versions support ARM), vale, milvus and pymilvus.
* The following packages have arm builds on pypi but not on conda-forge: pypdfium2, newspaper3k (package is no-arch but some deps are x86_64 only). For these we obtain them via pip on arm, but continue to install via conda for x86_64.
* pytorch cuda builds exist for ARM but lack the same meta-data that the x86_64 builds do. For now the cpu version of torch is installed for ARM #2095 

Related to nv-morpheus/utilities#90
Requires nv-morpheus/MRC#524 to be merged first
Closes [#2094](#2094)

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #2093
  • Loading branch information
dagardner-nv authored Jan 16, 2025
1 parent cba5022 commit ac97e6d
Show file tree
Hide file tree
Showing 50 changed files with 777 additions and 118 deletions.
3 changes: 2 additions & 1 deletion .devcontainer/docker/optional_deps/doca.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ MORPHEUS_SUPPORT_DOCA=${MORPHEUS_SUPPORT_DOCA:-OFF}
LINUX_DISTRO=${LINUX_DISTRO:-ubuntu}
LINUX_VER=${LINUX_VER:-22.04}
DOCA_VERSION=${DOCA_VERSION:-2.7.0}
PKG_ARCH=${PKG_ARCH:-$(dpkg --print-architecture)}

# Exit early if nothing to do
if [[ ${MORPHEUS_SUPPORT_DOCA} != @(TRUE|ON) ]]; then
Expand All @@ -35,7 +36,7 @@ DEB_DIR=${WORKING_DIR}/deb
mkdir -p ${DEB_DIR}

DOCA_OS_VERSION="ubuntu2204"
DOCA_PKG_LINK="https://www.mellanox.com/downloads/DOCA/DOCA_v${DOCA_VERSION}/host/doca-host_${DOCA_VERSION}-204000-24.04-${DOCA_OS_VERSION}_amd64.deb"
DOCA_PKG_LINK="https://www.mellanox.com/downloads/DOCA/DOCA_v${DOCA_VERSION}/host/doca-host_${DOCA_VERSION}-204000-24.04-${DOCA_OS_VERSION}_${PKG_ARCH}.deb"

# Upgrade the base packages (diff between image and Canonical upstream repo)
apt update -y
Expand Down
28 changes: 22 additions & 6 deletions .github/workflows/ci_pipe.yml
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ jobs:

build:
name: Build
runs-on: linux-amd64-cpu16
runs-on: linux-${{ matrix.arch }}-cpu16
timeout-minutes: 60
container:
credentials:
Expand All @@ -114,6 +114,8 @@ jobs:
image: ${{ inputs.container }}
strategy:
fail-fast: true
matrix:
arch: ["amd64", "arm64"]

steps:
- name: Checkout
Expand All @@ -130,13 +132,13 @@ jobs:
aws-region: ${{ vars.AWS_REGION }}
role-duration-seconds: 43200 # 12h

- name: Build:linux:x86_64:gcc
- name: Build:linux:${{ matrix.arch }}:gcc
shell: bash
run: ./morpheus/ci/scripts/github/build.sh

test:
name: Test
runs-on: linux-amd64-gpu-v100-latest-1
runs-on: ${{ matrix.runner }}
# Consider lowering this back down to 60 minutes per https://github.com/nv-morpheus/Morpheus/issues/1948
timeout-minutes: 90
container:
Expand All @@ -150,6 +152,13 @@ jobs:
PARALLEL_LEVEL: '10'
strategy:
fail-fast: true
matrix:
arch: ["amd64", "arm64"]
include:
- runner: linux-amd64-gpu-v100-latest-1
arch: "amd64"
- runner: linux-arm64-gpu-a100-latest-1
arch: "arm64"

steps:
- name: Checkout
Expand All @@ -166,7 +175,7 @@ jobs:
aws-region: ${{ vars.AWS_REGION }}
role-duration-seconds: 43200 # 12h

- name: Test:linux:x86_64:gcc
- name: Test:linux:${{ matrix.arch }}:gcc
shell: bash
run: ./morpheus/ci/scripts/github/test.sh

Expand Down Expand Up @@ -208,7 +217,7 @@ jobs:
name: Conda Package
if: ${{ inputs.conda_run_build }}
needs: [documentation, test]
runs-on: linux-amd64-gpu-v100-latest-1
runs-on: ${{ matrix.runner }}
timeout-minutes: 90
container:
image: ${{ inputs.base_container }}
Expand All @@ -218,6 +227,13 @@ jobs:
PARALLEL_LEVEL: '10'
strategy:
fail-fast: true
matrix:
arch: ["amd64", "arm64"]
include:
- runner: linux-amd64-gpu-v100-latest-1
arch: "amd64"
- runner: linux-arm64-cpu16
arch: "arm64"

steps:
- name: Checkout
Expand All @@ -235,7 +251,7 @@ jobs:
aws-region: ${{ vars.AWS_REGION }}
role-duration-seconds: 43200 # 12h

- name: Build morpheus-core conda package
- name: Build morpheus-core:${{ matrix.arch }} conda package
shell: bash
env:
CONDA_TOKEN: "${{ secrets.CONDA_TOKEN }}"
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,8 @@ jobs:
# Upload morpheus conda packages only for non PR branches. Use 'main' for main branch and 'dev' for all other branches
conda_upload_label: ${{ !fromJSON(needs.prepare.outputs.is_pr) && (fromJSON(needs.prepare.outputs.is_main_branch) && 'main' || 'dev') || '' }}
base_container: rapidsai/ci-conda:cuda12.5.1-ubuntu22.04-py3.10
container: nvcr.io/ea-nvidia-morpheus/morpheus:morpheus-ci-build-241024
test_container: nvcr.io/ea-nvidia-morpheus/morpheus:morpheus-ci-test-241024
container: nvcr.io/ea-nvidia-morpheus/morpheus:morpheus-ci-build-250102
test_container: nvcr.io/ea-nvidia-morpheus/morpheus:morpheus-ci-test-250102
secrets:
CONDA_TOKEN: ${{ secrets.CONDA_TOKEN }}
NGC_API_KEY: ${{ secrets.NGC_API_KEY }}
8 changes: 7 additions & 1 deletion ci/conda/recipes/morpheus-libs/morpheus_core_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,16 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# For now conda tests are disabled on aarch64, largely due to difficulties with installing a cuda enabled version of
# pytorch on aarch64 from a requirements file.
if [[ $(arch) == "aarch64" ]]; then
exit 0
fi

python3 <<EOF
import importlib.resources
import subprocess
requirements_file = importlib.resources.path("morpheus", "requirements_morpheus_core.txt")
requirements_file = importlib.resources.path("morpheus", "requirements_morpheus_core_arch-$(arch).txt")
subprocess.call(f"pip install -r {requirements_file}".split())
EOF

Expand Down
8 changes: 7 additions & 1 deletion ci/conda/recipes/morpheus-libs/morpheus_dfp_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,16 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# For now conda tests are disabled on aarch64, largely due to difficulties with installing a cuda enabled version of
# pytorch on aarch64 from a requirements file.
if [[ $(arch) == "aarch64" ]]; then
exit 0
fi

python3 <<EOF
import importlib.resources
import subprocess
requirements_file = importlib.resources.path("morpheus_dfp", "requirements_morpheus_dfp.txt")
requirements_file = importlib.resources.path("morpheus_dfp", "requirements_morpheus_dfp_arch-$(arch).txt")
subprocess.call(f"pip install -r {requirements_file}".split())
EOF

Expand Down
8 changes: 7 additions & 1 deletion ci/conda/recipes/morpheus-libs/morpheus_llm_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,17 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# For now conda tests are disabled on aarch64, largely due to difficulties with installing a cuda enabled version of
# pytorch on aarch64 from a requirements file.
if [[ $(arch) == "aarch64" ]]; then
exit 0
fi

# Install requirements if they are included in the package
python3 <<EOF
import importlib.resources
import subprocess
requirements_file = importlib.resources.path("morpheus_llm", "requirements_morpheus_llm.txt")
requirements_file = importlib.resources.path("morpheus_llm", "requirements_morpheus_llm_arch-$(arch).txt")
subprocess.call(f"pip install -r {requirements_file}".split())
EOF

Expand Down
18 changes: 11 additions & 7 deletions ci/runner/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,32 +22,36 @@ ARG LINUX_DISTRO=ubuntu
ARG LINUX_VER=22.04
ARG PROJ_NAME=morpheus
ARG PYTHON_VER=3.10
ARG ARCH=x86_64
ARG REAL_ARCH=x86_64


# Configure the base docker img
FROM ${FROM_IMAGE}:cuda${CUDA_VER}-${LINUX_DISTRO}${LINUX_VER}-py${PYTHON_VER} AS base
FROM --platform=$TARGETPLATFORM ${FROM_IMAGE}:cuda${CUDA_VER}-${LINUX_DISTRO}${LINUX_VER}-py${PYTHON_VER} AS base

ARG PROJ_NAME
ARG CUDA_SHORT_VER

SHELL ["/bin/bash", "-c"]

ENV REAL_ARCH=${REAL_ARCH}

# Create conda environment
COPY ./dependencies.yaml /tmp/conda/

# ============ build ==================
FROM base as build

# Add any build only dependencies here.ARG ARCH
# Add any build only dependencies here.
ARG CUDA_SHORT_VER
ARG PROJ_NAME
ARG PYTHON_VER
ARG REAL_ARCH

RUN rapids-dependency-file-generator \
--config /tmp/conda/dependencies.yaml \
--output conda \
--file-key build \
--matrix "cuda=${CUDA_SHORT_VER};arch=${ARCH};py=${PYTHON_VER}" > /tmp/conda/env.yaml && \
--matrix "cuda=${CUDA_SHORT_VER};arch=${REAL_ARCH};py=${PYTHON_VER}" > /tmp/conda/env.yaml && \
CONDA_ALWAYS_YES=true /opt/conda/bin/conda env create -n ${PROJ_NAME} -q --file /tmp/conda/env.yaml && \
sed -i "s/conda activate base/conda activate ${PROJ_NAME}/g" ~/.bashrc && \
conda clean -afy && \
Expand All @@ -64,15 +68,15 @@ RUN apt update && \
libtool \
automake && \
apt clean && \
/tmp/doca/doca.sh /tmp/doca && \
PKG_ARCH=${TARGETARCH} /tmp/doca/doca.sh /tmp/doca && \
rm -rf /tmp/doca

# ============ test ==================
FROM base as test

# Add any test only dependencies here.

ARG ARCH
ARG REAL_ARCH
ARG CUDA_SHORT_VER
ARG PROJ_NAME
ARG PYTHON_VER
Expand All @@ -88,7 +92,7 @@ RUN rapids-dependency-file-generator \
--config /tmp/conda/dependencies.yaml \
--output conda \
--file-key test \
--matrix "cuda=${CUDA_SHORT_VER};arch=${ARCH};py=${PYTHON_VER}" > /tmp/conda/env.yaml && \
--matrix "cuda=${CUDA_SHORT_VER};arch=${REAL_ARCH};py=${PYTHON_VER}" > /tmp/conda/env.yaml && \
CONDA_ALWAYS_YES=true /opt/conda/bin/conda env create -n ${PROJ_NAME} -q --file /tmp/conda/env.yaml && \
sed -i "s/conda activate base/conda activate ${PROJ_NAME}/g" ~/.bashrc && \
conda clean -afy && \
Expand Down
14 changes: 7 additions & 7 deletions ci/scripts/github/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ source ${WORKSPACE}/ci/scripts/github/cmake_all.sh
rapids-dependency-file-generator \
--output conda \
--file-key build \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" | tee "${WORKSPACE_TMP}/env.yaml"
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=${REAL_ARCH};py=${RAPIDS_PY_VERSION}" | tee "${WORKSPACE_TMP}/env.yaml"

update_conda_env "${WORKSPACE_TMP}/env.yaml"

Expand All @@ -43,22 +43,22 @@ cmake --build ${BUILD_DIR} --parallel ${PARALLEL_LEVEL}
log_sccache_stats

rapids-logger "Archiving results"
tar cfj "${WORKSPACE_TMP}/wheel.tar.bz" ${BUILD_DIR}/python/morpheus/dist ${BUILD_DIR}/python/morpheus_llm/dist ${BUILD_DIR}/python/morpheus_dfp/dist
tar cfj "${WORKSPACE_TMP}/wheel-${REAL_ARCH}.tar.bz" ${BUILD_DIR}/python/morpheus/dist ${BUILD_DIR}/python/morpheus_llm/dist ${BUILD_DIR}/python/morpheus_dfp/dist

MORPHEUS_LIBS=($(find ${MORPHEUS_ROOT}/${BUILD_DIR}/python/morpheus/morpheus/_lib -name "*.so" -exec realpath --relative-to ${MORPHEUS_ROOT} {} \;) \
$(find ${MORPHEUS_ROOT}/${BUILD_DIR}/python/morpheus_llm/morpheus_llm/_lib -name "*.so" -exec realpath --relative-to ${MORPHEUS_ROOT} {} \;) \
$(find ${MORPHEUS_ROOT}/examples -name "*.so" -exec realpath --relative-to ${MORPHEUS_ROOT} {} \;))
tar cfj "${WORKSPACE_TMP}/morhpeus_libs.tar.bz" "${MORPHEUS_LIBS[@]}"
tar cfj "${WORKSPACE_TMP}/morhpeus_libs-${REAL_ARCH}.tar.bz" "${MORPHEUS_LIBS[@]}"

CPP_TESTS=($(find ${MORPHEUS_ROOT}/${BUILD_DIR}/python/morpheus/morpheus/_lib/tests -name "*.x" -exec realpath --relative-to ${MORPHEUS_ROOT} {} \;) \
$(find ${MORPHEUS_ROOT}/${BUILD_DIR}/python/morpheus_llm/morpheus_llm/_lib/tests -name "*.x" -exec realpath --relative-to ${MORPHEUS_ROOT} {} \;))
tar cfj "${WORKSPACE_TMP}/cpp_tests.tar.bz" "${CPP_TESTS[@]}"
tar cfj "${WORKSPACE_TMP}/cpp_tests-${REAL_ARCH}.tar.bz" "${CPP_TESTS[@]}"

rapids-logger "Pushing results to ${DISPLAY_ARTIFACT_URL}"
set_job_summary_preamble
upload_artifact "${WORKSPACE_TMP}/wheel.tar.bz"
upload_artifact "${WORKSPACE_TMP}/morhpeus_libs.tar.bz"
upload_artifact "${WORKSPACE_TMP}/cpp_tests.tar.bz"
upload_artifact "${WORKSPACE_TMP}/wheel-${REAL_ARCH}.tar.bz"
upload_artifact "${WORKSPACE_TMP}/morhpeus_libs-${REAL_ARCH}.tar.bz"
upload_artifact "${WORKSPACE_TMP}/cpp_tests-${REAL_ARCH}.tar.bz"

rapids-logger "Success"
exit 0
2 changes: 1 addition & 1 deletion ci/scripts/github/checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ source ${WORKSPACE}/ci/scripts/github/cmake_all.sh
rapids-dependency-file-generator \
--output conda \
--file-key build \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" | tee "${WORKSPACE_TMP}/env.yaml"
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=${REAL_ARCH};py=${RAPIDS_PY_VERSION}" | tee "${WORKSPACE_TMP}/env.yaml"

update_conda_env "${WORKSPACE_TMP}/env.yaml"

Expand Down
8 changes: 7 additions & 1 deletion ci/scripts/github/cmake_all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,12 @@ fi
export CMAKE_BUILD_ALL_FEATURES="${_FLAGS[@]}"
unset _FLAGS

if [[ ${MORPHEUS_SUPPORT_DOCA} == @(TRUE|ON) ]]; then
if [[ ${REAL_ARCH} == "aarch64" ]]; then
# Currently DOCA is failing to build on ARM
# https://github.com/nv-morpheus/Morpheus/issues/2092
export MORPHEUS_SUPPORT_DOCA=OFF
fi

if [[ ${MORPHEUS_SUPPORT_DOCA} == @(TRUE|ON) && ${REAL_ARCH} == "x86_64" ]]; then
export CMAKE_BUILD_ALL_FEATURES="${CMAKE_BUILD_ALL_FEATURES} -DMORPHEUS_SUPPORT_DOCA=ON"
fi
5 changes: 3 additions & 2 deletions ci/scripts/github/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ cd ${MORPHEUS_ROOT}
# will be defined specifying the subset we are allowed to use.
NUM_CORES=$(nproc)
export PARALLEL_LEVEL=${PARALLEL_LEVEL:-${NUM_CORES}}
export REAL_ARCH=${REAL_ARCH:-$(arch)}
rapids-logger "Procs: ${NUM_CORES}"
/usr/bin/lscpu

Expand Down Expand Up @@ -143,8 +144,8 @@ function show_conda_info() {
function log_toolchain() {
rapids-logger "Check versions"
python3 --version
x86_64-conda-linux-gnu-cc --version
x86_64-conda-linux-gnu-c++ --version
${REAL_ARCH}-conda-linux-gnu-cc --version
${REAL_ARCH}-conda-linux-gnu-c++ --version
cmake --version
ninja --version
sccache --version
Expand Down
4 changes: 2 additions & 2 deletions ci/scripts/github/conda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,9 @@ if [[ " ${CI_SCRIPT_ARGS} " =~ " upload " ]]; then
rapids-logger "Building Conda Package... Done"
else
# if we didn't receive the upload argument, we can still upload the artifact to S3
tar cfj "${WORKSPACE_TMP}/conda.tar.bz" "${RAPIDS_CONDA_BLD_OUTPUT_DIR}"
tar cfj "${WORKSPACE_TMP}/conda-${REAL_ARCH}.tar.bz" "${RAPIDS_CONDA_BLD_OUTPUT_DIR}"
ls -lh ${WORKSPACE_TMP}/

rapids-logger "Pushing results to ${DISPLAY_ARTIFACT_URL}/"
upload_artifact "${WORKSPACE_TMP}/conda.tar.bz"
upload_artifact "${WORKSPACE_TMP}/conda-${REAL_ARCH}.tar.bz"
fi
11 changes: 7 additions & 4 deletions ci/scripts/github/conda_libs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,11 @@ rapids-logger "Listing LFS-known files"
git lfs ls-files
rapids-logger "Building Morpheus Libraries"

# Run nvidia-smi to check the test env
/usr/bin/nvidia-smi
# If we have access to a GPU run nvidia-smi to check the test env
if [ -f /usr/bin/nvidia-smi ]; then
/usr/bin/nvidia-smi
fi


# Run the conda build, and upload to conda forge if requested
export MORPHEUS_PYTHON_BUILD_STUBS=OFF
Expand All @@ -57,9 +60,9 @@ if [[ " ${CI_SCRIPT_ARGS} " =~ " upload " ]]; then
rapids-logger "Building Morpheus Libraries... Done"
else
# if we didn't receive the upload argument, we can still upload the artifact to S3
tar cfj "${WORKSPACE_TMP}/conda_libs.tar.bz" "${RAPIDS_CONDA_BLD_OUTPUT_DIR}"
tar cfj "${WORKSPACE_TMP}/conda_libs-${REAL_ARCH}.tar.bz" "${RAPIDS_CONDA_BLD_OUTPUT_DIR}"
ls -lh ${WORKSPACE_TMP}/

rapids-logger "Pushing results to ${DISPLAY_ARTIFACT_URL}/"
upload_artifact "${WORKSPACE_TMP}/conda_libs.tar.bz"
upload_artifact "${WORKSPACE_TMP}/conda_libs-${REAL_ARCH}.tar.bz"
fi
6 changes: 3 additions & 3 deletions ci/scripts/github/docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ source ${WORKSPACE}/ci/scripts/github/cmake_all.sh
rapids-dependency-file-generator \
--output conda \
--file-key docs \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" | tee "${WORKSPACE_TMP}/env.yaml"
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=${REAL_ARCH};py=${RAPIDS_PY_VERSION}" | tee "${WORKSPACE_TMP}/env.yaml"

update_conda_env "${WORKSPACE_TMP}/env.yaml"

download_artifact "wheel.tar.bz"
download_artifact "wheel-${REAL_ARCH}.tar.bz"

tar xf "${WORKSPACE_TMP}/wheel.tar.bz"
tar xf "${WORKSPACE_TMP}/wheel-${REAL_ARCH}.tar.bz"

pip install ${MORPHEUS_ROOT}/${BUILD_DIR}/python/morpheus/dist/*.whl
pip install ${MORPHEUS_ROOT}/${BUILD_DIR}/python/morpheus_llm/dist/*.whl
Expand Down
Loading

0 comments on commit ac97e6d

Please sign in to comment.