Skip to content

Commit

Permalink
- initial FTS 2.6.0 dev commit
Browse files Browse the repository at this point in the history
- added logic evaluating whether a given conditional experimental patch is required in the current execution context
- updated docker image to use PyTorch 2.6 nightly (20241121)
- removed conda builds/usage per PyTorch deprecation of conda env management (using venv instead)
- removed numpy<2.0 requirement
- made PyTorch 2.3 the minimum required PyTorch
- einsum patch no longer required for PyTorch >= 2.6
  • Loading branch information
speediedan committed Nov 25, 2024
1 parent 6b1da2d commit d818241
Show file tree
Hide file tree
Showing 28 changed files with 79 additions and 151 deletions.
4 changes: 2 additions & 2 deletions .azure-pipelines/gpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ trigger:
include:
- "main"
- "release/*"
- "model_parallel_exp_support" # temporarily add for new test infra enhancement validation
# - "model_parallel_exp_support" # temporarily add for specific feature branch validation
- "refs/tags/*"
paths:
include:
Expand Down Expand Up @@ -46,7 +46,7 @@ jobs:
strategy:
matrix:
'PyTorch | latest':
image: "speediedan/finetuning-scheduler:py3.12-pt2.5.1-pl2.5-azpl-init"
image: "speediedan/finetuning-scheduler:py3.12-pt2.6.0-pl2.6-azpl-init"
scope: ""
# how long to run the job before automatically cancelling
timeoutInMinutes: "100"
Expand Down
8 changes: 4 additions & 4 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,14 @@ python collect_env_details.py
You can also fill out the list below manually.
-->

- Fine-Tuning Scheduler Version (e.g., 2.5.0):
- Lightning Version (e.g., 2.5.0):
- PyTorch Version (e.g., 2.5.1):
- Fine-Tuning Scheduler Version (e.g., 2.6.0):
- Lightning Version (e.g., 2.6.0):
- PyTorch Version (e.g., 2.6.0):
- Python version (e.g., 3.12):
- OS (e.g., Linux):
- CUDA/cuDNN version:
- GPU models and configuration:
- How you installed PyTorch (`conda`, `pip`, source):
- How you installed PyTorch (`pip`, source):
- If compiling from source, the output of `torch.__config__.show()`:
- Any other relevant information:

Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/release-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@ jobs:
matrix:
# initially building only the latest supported configuration
python_version: ["3.12"]
pytorch_version: ["2.5.1"]
cust_base: ["cu12.4.0-"]
pl_version: ["2.5"]
pytorch_version: ["2.6.0"]
cust_base: ["cu12.6.2-"]
pl_version: ["2.6"]
steps:
- name: Checkout
uses: actions/checkout@v4
Expand Down
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,17 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

## [2.6.0] - 2024-XX-XX

### Added

- Support for Lightning and PyTorch ``2.6.0``

### Deprecated

- removed support for PyTorch `2.2`
- removed use of conda builds (aligning with upstream PyTorch)

## [2.5.0] - 2024-XX-XX

### Added
Expand Down
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ date-released: 2022-02-04
authors:
- family-names: "Dale"
given-names: "Dan"
version: 2.5.0
version: 2.6.0
identifiers:
- description: "Fine-Tuning Scheduler (all versions)"
type: doi
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ To ensure maximum stability, the latest Lightning patch release fully tested wit
<details>
<summary>Current build statuses for Fine-Tuning Scheduler </summary>

| System / (PyTorch/Python ver) | 2.2.2/3.9 | 2.5.1/3.9, 2.5.1/3.12 |
| System / (PyTorch/Python ver) | 2.3.1/3.9 | 2.6.0/3.9, 2.6.0/3.12 |
| :---------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| Linux \[GPUs\*\*\] | - | [![Build Status](https://dev.azure.com//speediedan/finetuning-scheduler/_apis/build/status/Multi-GPU%20&%20Example%20Tests?branchName=main)](https://dev.azure.com/speediedan/finetuning-scheduler/_build/latest?definitionId=1&branchName=main) |
| Linux (Ubuntu 22.04) | [![Test](https://github.com/speediedan/finetuning-scheduler/actions/workflows/ci_test-full.yml/badge.svg?branch=main&event=push)](https://github.com/speediedan/finetuning-scheduler/actions/workflows/ci_test-full.yml) | [![Test](https://github.com/speediedan/finetuning-scheduler/actions/workflows/ci_test-full.yml/badge.svg?branch=main&event=push)](https://github.com/speediedan/finetuning-scheduler/actions/workflows/ci_test-full.yml) |
Expand Down
10 changes: 5 additions & 5 deletions dockers/base-cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@
# limitations under the License.
# initially based on https://bit.ly/3pdAf1G

ARG CUDA_VERSION=12.4.0
ARG CUDA_VERSION=12.6.2
ARG OS_VER=ubuntu22.04

FROM nvidia/cuda:${CUDA_VERSION}-devel-${OS_VER}

ARG PYTHON_VERSION=3.12
ARG PYTORCH_VERSION=2.5.1
ARG PYTORCH_VERSION=2.6.0
ARG CUST_BUILD=0
ARG MKL_THREADING_LAYER=GNU

Expand Down Expand Up @@ -85,13 +85,13 @@ RUN \
else \
# or target a specific cuda build, by specifying a particular index url w/...
# ... default channel
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124; \
#pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126; \
# ... pytorch patch version
# pip install torch==1.11.1+cu113 torchvision==0.11.3+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html; \
# ... pytorch nightly dev version
#pip install --pre torch==2.5.0.dev20240827 torchvision==0.20.0.dev20240827 --index-url https://download.pytorch.org/whl/nightly/cu124; \
pip install --pre torch==2.6.0.dev20241121 torchvision==0.20.0.dev20241121 --index-url https://download.pytorch.org/whl/nightly/cu126; \
# ... test channel
#pip install --pre torch==2.5.0 torchvision --index-url https://download.pytorch.org/whl/test/cu124; \
#pip install --pre torch==2.6.0 torchvision --index-url https://download.pytorch.org/whl/test/cu126; \
fi && \
# Install all requirements
pip install -r requirements/devel.txt --no-cache-dir && \
Expand Down
6 changes: 6 additions & 0 deletions dockers/build_image_version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,9 @@ build_version(){
--build-arg PYTORCH_VERSION=${iv_ref["pytorch"]} --no-cache . >> $docker_build_log
docker tag ${azpl_name} ${registry_name}:${azpl_name} >> $docker_build_log
}

maybe_deactivate(){
if [ -n "$VIRTUAL_ENV" ]; then
deactivate
fi
}
7 changes: 3 additions & 4 deletions dockers/docker_images_main.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,12 @@ registry_name=$2
build_new="${3:-1}"
push_remote="${4:-1}"

eval "$(conda shell.bash hook)" # setup shell functions for conda, uses conda's .bashrc resident defined hook to execute conda init setup to enable subsequent conda command usage
conda deactivate
maybe_deactivate

d=`date +%Y%m%d%H%M%S`
tmp_docker_build_log_dir="/tmp"
docker_build_log="${tmp_docker_build_log_dir}/fts_update_docker_main_images_${d}.log"


maybe_push(){
if [[ $push_remote -ne 0 ]]; then
echo "Beginning upload of built images..." >> $docker_build_log
Expand All @@ -43,7 +41,8 @@ maybe_build(){

build_eval(){
# latest PyTorch image supported by release
declare -A iv=(["cuda"]="12.4.0" ["python"]="3.12" ["pytorch"]="2.5.1" ["lightning"]="2.5" ["cust_build"]="1")
# see CUDA_ARCHES_FULL_VERSION for the full version of the pytorch-provided toolkit
declare -A iv=(["cuda"]="12.6.2" ["python"]="3.12" ["pytorch"]="2.6.0" ["lightning"]="2.6" ["cust_build"]="1")
export latest_pt="base-cu${iv["cuda"]}-py${iv["python"]}-pt${iv["pytorch"]}-pl${iv["lightning"]}"
export latest_azpl="py${iv["python"]}-pt${iv["pytorch"]}-pl${iv["lightning"]}-azpl-init"
maybe_build iv "${latest_pt}" "${latest_azpl}"
Expand Down
7 changes: 2 additions & 5 deletions dockers/docker_images_release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,7 @@ registry_name=$2
build_new="${3:-1}"
push_remote="${4:-1}"

# setup shell functions for conda, uses conda's .bashrc resident defined hook to execute conda init setup to enable
# subsequent conda command usage
eval "$(conda shell.bash hook)"
conda deactivate
maybe_deactivate

d=`date +%Y%m%d%H%M%S`
tmp_docker_build_log_dir="/tmp"
Expand Down Expand Up @@ -44,7 +41,7 @@ maybe_build(){

build_eval(){
# latest PyTorch image supported by release
declare -A iv=(["cuda"]="12.4.0" ["python"]="3.12" ["pytorch"]="2.5.1" ["lightning"]="2.5" ["cust_build"]="0")
declare -A iv=(["cuda"]="12.6.2" ["python"]="3.12" ["pytorch"]="2.6.0" ["lightning"]="2.6" ["cust_build"]="0")
export latest_pt="base-cu${iv["cuda"]}-py${iv["python"]}-pt${iv["pytorch"]}-pl${iv["lightning"]}"
export latest_azpl="py${iv["python"]}-pt${iv["pytorch"]}-pl${iv["lightning"]}-azpl-init"
maybe_build iv "${latest_pt}" "${latest_azpl}"
Expand Down
4 changes: 2 additions & 2 deletions dockers/fts-az-base/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
# limitations under the License.

ARG PYTHON_VERSION=3.12
ARG PYTORCH_VERSION=2.5.1
ARG LIGHTNING_VERSION=2.5
ARG PYTORCH_VERSION=2.6.0
ARG LIGHTNING_VERSION=2.6
ARG CUST_BASE

FROM speediedan/finetuning-scheduler:base-${CUST_BASE}py${PYTHON_VERSION}-pt${PYTORCH_VERSION}-pl${LIGHTNING_VERSION}
Expand Down
88 changes: 0 additions & 88 deletions dockers/release-conda/Dockerfile

This file was deleted.

5 changes: 0 additions & 5 deletions dockers/release-conda/conda_entrypoint.sh

This file was deleted.

4 changes: 2 additions & 2 deletions dockers/release/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
# limitations under the License.

ARG PYTHON_VERSION=3.12
ARG PYTORCH_VERSION=2.5.1
ARG LIGHTNING_VERSION=2.5
ARG PYTORCH_VERSION=2.6.0
ARG LIGHTNING_VERSION=2.6
ARG CUST_BASE

FROM speediedan/finetuning-scheduler:base-${CUST_BASE}py${PYTHON_VERSION}-pt${PYTORCH_VERSION}-pl${LIGHTNING_VERSION}
Expand Down
4 changes: 2 additions & 2 deletions requirements/base.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#lightning>=2.5.0,<2.5.1
#lightning>=2.6.0,<2.6.1
# the below is uncommented when master is targeting a specific pl dev master commit
git+https://github.com/Lightning-AI/lightning.git@8ce52876ad6e5eb05e0965f72e034ffe46b327ba#egg=lightning
torch>=2.2.0
torch>=2.3.0
2 changes: 1 addition & 1 deletion requirements/examples.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ sentencepiece
tensorboardX>=2.2
tabulate
psutil
numpy<2.0 # to avoid issues with oldest supported pytorch (2.2)
#numpy<2.0 # to avoid issues with oldest supported pytorch (2.3)
2 changes: 1 addition & 1 deletion requirements/pl_adjust_versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

# IMPORTANT: this list needs to be sorted in reverse
VERSIONS = [
dict(torch="2.6.0", torchvision="0.21.0"), # nightly
dict(torch="2.6.0", torchvision="0.20.1"), # nightly torchvision nightly not yet bumped as of 20241124
dict(torch="2.5.1", torchvision="0.20.1"), # stable
dict(torch="2.5.0", torchvision="0.20.0"),
dict(torch="2.4.0", torchvision="0.19.0"),
Expand Down
4 changes: 2 additions & 2 deletions requirements/standalone_base.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#pytorch-lightning>=2.5.0,<2.5.1
#pytorch-lightning>=2.6.0,<2.6.1
# the below is uncommented when master is targeting a specific pl dev master commit
git+https://github.com/Lightning-AI/pytorch-lightning.git@8ce52876ad6e5eb05e0965f72e034ffe46b327ba#egg=pytorch-lightning
torch>=2.2.0
torch>=2.3.0
2 changes: 1 addition & 1 deletion src/finetuning_scheduler/__about__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import time

_this_year = time.strftime("%Y")
__version__ = "2.5.0.dev0"
__version__ = "2.6.0.dev0"
__author__ = "Dan Dale"
__author_email__ = "danny.dale@gmail.com"
__license__ = "Apache-2.0"
Expand Down
10 changes: 2 additions & 8 deletions src/fts_examples/cli_experiment_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,9 @@ def instantiate_class(init: Dict[str, Any], args: Optional[Union[Any, Tuple[Any,
# override PyTorch default, extending it to capture additional salient packages for reproducability
# https://github.com/pytorch/pytorch/blob/7c2489bdae5a96dc122c3bb7b42c18528bcfdc86/torch/utils/collect_env.py#L271
def get_pip_packages(run_lambda):
"""Returns `pip list` output.
Note: will also find conda-installed pytorch
and numpy packages.
"""
"""Returns `pip list` output."""
# People generally have `pip` as `pip` or `pip3`
# But here it is incoved as `python -mpip`
# But here it is invoked as `python -mpip`
def run_with_pip(pip):
if collect_env.get_platform() == "win32":
system_root = os.environ.get("SYSTEMROOT", "C:\\Windows")
Expand Down Expand Up @@ -126,7 +122,6 @@ def get_env_info():
"miopen_runtime_version": miopen_runtime_version,
"pip_version": pip_version,
"pip_packages": pip_list_output,
"conda_packages": collect_env.get_conda_packages(run_lambda),
"os": collect_env.get_os(run_lambda),
"libc_version": collect_env.get_libc_version(),
"gcc_version": collect_env.get_gcc_version(run_lambda),
Expand Down Expand Up @@ -167,7 +162,6 @@ def collect_env_info() -> Dict:
"cudnn_version",
"pip_version", # 'pip' or 'pip3'
"pip_packages",
"conda_packages",
"hip_compiled_version",
"hip_runtime_version",
"miopen_runtime_version",
Expand Down
5 changes: 3 additions & 2 deletions src/fts_examples/patching/dep_patch_shim.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def _patch_triton():
sys.modules.get(target_mod).__dict__.get('JITFunction').__init__ = _new_init


# required for `torch==2.5.x`, TBD wrt subsequent versions
# remove once `torch==2.6.x` is minimum (only required for `torch==2.5.x`)
einsum_strategies_patch = DependencyPatch(
condition=(lwt_compare_version("torch", operator.le, "2.5.2"),
lwt_compare_version("torch", operator.ge, "2.5.0"),),
Expand All @@ -71,13 +71,14 @@ def _patch_triton():
patched_package='datasets',
description='Adjust `NumpyArrowExtractor` to properly use `numpy` 2.0 copy semantics')

# only required for `torch==2.4.x`
# TODO: remove once `torch==2.5.x` is minimum (only required for `torch==2.4.x`)
triton_codgen_patch = DependencyPatch(
condition=(lwt_compare_version("pytorch-triton", operator.eq, "3.0.0", "45fff310c8"),),
env_flag=OSEnvToggle("ENABLE_FTS_TRITON_CODEGEN_PATCH", default="1"),
function=_patch_triton, patched_package='pytorch-triton',
description='Address `triton` #3564 until PyTorch pins the upstream fix')


class ExpPatch(Enum):
EINSUM_STRATEGIES = einsum_strategies_patch
NUMPY_EXTRACTOR = datasets_numpy_extractor_patch
Expand Down
2 changes: 1 addition & 1 deletion src/fts_examples/patching/patched_einsum_strategies.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# ruff: noqa: F821
# pyright: reportUndefinedVariable=false

if lwt_compare_version("torch", operator.ge, "2.5.0"):
if lwt_compare_version("torch", operator.ge, "2.5.0") and lwt_compare_version("torch", operator.le, "2.5.2"):
globals().update(_prepare_module_ctx('torch.distributed.tensor._ops._einsum_strategy', globals()))


Expand Down
Loading

0 comments on commit d818241

Please sign in to comment.