Changelog

v20.4.1 (2022-12-06)

Bug Fixes and Other Changes

Update deprecated dependency package name from sklearn to scikit-learn

v20.4.0 (2022-07-08)

Features

Add heterogeneous cluster changes

v20.3.1 (2022-06-04)

Bug Fixes and Other Changes

Fix/ci

v20.3.0 (2020-12-17)

Features

use tensorflow 2.3.1 and add data parallel integ test

Bug Fixes and Other Changes

upgrade to sagemaker-training 3.7.1

Testing and Release Infrastructure

include granular buildspecs for dlc and generic cpu and gpu testing

v20.2.0 (2020-12-11)

Features

include sm-data-distributed and upgrade dependencies

v20.1.5 (2020-12-08)

Bug Fixes and Other Changes

workaround to print stderr when capture_error is True

v20.1.4 (2020-11-06)

Bug Fixes and Other Changes

propagate log level

v20.1.3 (2020-10-19)

Bug Fixes and Other Changes

add condition to avoid error when 'model_dir' is None

v20.1.2 (2020-08-27)

Bug Fixes and Other Changes

add a missing comma

v20.1.1 (2020-08-23)

Bug Fixes and Other Changes

call entry_point.run with capture_error=True

v20.1.0 (2020-08-06)

Features

tensorflow 2.3 support

v20.0.1.post2 (2020-07-02)

Testing and Release Infrastructure

add integration test for MPI env vars propagation

v20.0.1.post1 (2020-06-18)

Testing and Release Infrastructure

add single-instance, multi-process Horovod test for local GPU

v20.0.1.post0 (2020-06-15)

Documentation Changes

Update README.rst

Testing and Release Infrastructure

Make docker folder read only, remove unused tests.

v20.0.1 (2020-06-10)

Bug Fixes and Other Changes

fix. bump version of sagemaker-training for script entry point fix.

v20.0.0.post0 (2020-05-18)

Documentation Changes

update image-building instructions

v4.0.1 (2020-05-13)

Bug Fixes and Other Changes

Bump version of sagemaker-training for typing fix

v4.0.0 (2020-05-09)

Breaking Changes

Replace sagemaker-containers with sagemaker-training

Features

Python 3.7 support

Testing and Release Infrastructure

fix typo in release buildspec.

v3.2.3.post1 (2020-04-30)

Testing and Release Infrastructure

use tox in buildspecs

v3.2.3.post0 (2020-04-27)

Documentation Changes

remove extra newlines for consistency

v3.2.3 (2020-04-16)

Bug Fixes and Other Changes

version bump

v3.2.2 (2020-04-08)

Bug Fixes and Other Changes

Revert the version of SMdebug in TF2

v3.2.1 (2020-04-07)

Bug Fixes and Other Changes

version bump

v3.2.0 (2020-04-02)

Features

install sagemaker-tensorflow-toolkit from PyPI.

Bug Fixes and Other Changes

Upgrading the pyyaml version

v3.1.8 (2020-04-01)

Bug Fixes and Other Changes

Allowing arguments for deep_learning_container.py for tf2

v3.1.7.post1 (2020-03-31)

Testing and Release Infrastructure

refactor toolkit tests.

v3.1.7.post0 (2020-03-31)

Testing and Release Infrastructure

copy tests to test-toolkit folder.

v3.1.7 (2020-03-26)

Bug Fixes and Other Changes

Adding of deep_learning_container.py in Tf2

v3.1.6 (2020-03-16)

Bug Fixes and Other Changes

Added skip marker
smdebug 0.7.1
Revert "Revert "add py2 fixture for tf2.x test (#310)" (#319)"
Revert "add py2 fixture for tf2.x test (#310)"
Update sagemaker-python-sdk in test env
add missing fixtures

v3.1.5 (2020-03-12)

Bug Fixes and Other Changes

Add default timeout
install experiments with python 3
add py2 fixture for tf2.x test

v3.1.4 (2020-03-11)

Bug Fixes and Other Changes

Add smdebug to TF 2.x
remove python 3.6 specific f strings
Updating LD_LIBRARY_PATH for tf-2.0 Dockerfile.gpu

v3.1.3 (2020-03-10)

Bug Fixes and Other Changes

install SageMaker Python SDK into Python 3 images
Add sagemaker-experiments

v3.1.2 (2020-03-04)

Bug Fixes and Other Changes

update sagemaker-tensorflow version to 2.1

Testing and Release Infrastructure

add pipemode integ tests.

v3.1.1 (2020-02-17)

Bug Fixes and Other Changes

update: add r2.1 dockerfiles
add 2.0.1 dockerfiles

v3.1.0 (2020-02-14)

Features

Add release to PyPI. Change package name to sagemaker-tensorflow-training.

Bug Fixes and Other Changes

remove sagemaker_experiments from mnist script
Update package name in the dockerfile for tf-2.0.
Merge branch 'master' of https://github.com/aws/sagemaker-tensorflow-containers into tf-2
Merge branch 'master' of https://github.com/aws/sagemaker-tensorflow-containers into merge-master-into-tf-2
Revert "Merge 'master' branch into 'tf-2' branch. (#279)"
Merge 'master' branch into 'tf-2' branch.

Documentation Changes

update README.rst

Testing and Release Infrastructure

Add twine check during PR.
properly fail build if has-matching-changes fails

v0.1.0 (2019-05-22)

Bug fixes and other changes

v2.0.7 (2019-08-15)

Bug fixes and other changes

update no-p2 and no-p3 regions.

v2.0.6 (2019-08-01)

Bug fixes and other changes

fix horovod mnist script

v2.0.5 (2019-06-17)

Bug fixes and other changes

bump sagemaker-containers version to 2.4.10
add hyperparameter tuning test

v2.0.4 (2019-06-06)

Bug fixes and other changes

fix integ test errors when running with py2

v2.0.3 (2019-06-06)

Bug fixes and other changes

only run one test during deployment

v2.0.2 (2019-06-04)

Bug fixes and other changes

resolve pluggy version conflict

v2.0.1 (2019-06-03)

Bug fixes and other changes

remove non-ascii character in CHANGELOG
remove extra comma in buildspec-release.yml

v2.0.0 (2019-06-03)

Bug fixes and other changes

Parameterize processor and py_version for test runs
use unique name for integration job hyperparameter tuning job
fix flake8 errors and add flake8 run in buildspec.yml
skip gpu SageMaker test in regions with limited amount of p2/3 instances
skip setup on second remote run
add setup file back
add branch name to remote gpu test run command
remove setup file in release build gpu test
ignore coverage in release build tests
use tar file name as framework_support_installable in build_all.py
Add release build
Explicitly set lower-bound for botocore version
Pull request to test codebuild trigger on TensorFlow script mode
Update integ test for checking Python version
Upgrade to TensorFlow 1.13.1
Add mpi4py to pip installs
Add SageMaker integ test for hyperparameter tuning model_dir logic
Add Horovod benchmark
Fix model_dir adjustment for hyperparameter tuning jobs
change model_dir to training job name if it is for tuning.
Tune test_s3_plugin test
Skip the s3_plugin test before new binary released
Add model saving warning at end of training
Specify region when creating S3 resource in integ tests
Fix instance_type fixture setup for tests
Read framework version from Python SDK for integ test default
Fix SageMaker Session handling in Horovod test
Configure encoding to be utf-8
Use the test argement framework_version in all tests
Fix broken test test_distributed_mnist_no_ps
Add S3 plugin tests
Skip horovod local CPU test in GPU instances
Add Horovod tests
Skip horovod integration tests
TensorFlow 1.12 and Horovod support
Deprecate get_marker. Use get_closest_marker instead
Force parameter server to run on CPU
Add python-dev and build-essential to Dockerfiles
Update script_mode_train_any_tf_script_in_sage_maker.ipynb
Skip keras local mode test on gpu and use random port for serving in the test
Fix Keras test
Create parameter server in different thread
Add Keras support
Fix broken unit tests
Unset CUDA_VISIBLE_DEVICES for worker processes
Disable GPU for parameter process
Set parameter process waiting to False
Update sagemaker containers
GPU fix
Set S3 environment variables
Add CI configuration files
Add distributed training support
Edited the tf script mode notebook
Add benchmarking script
Add Script Mode example
Add integration tests to run training jobs with sagemaker
Add tox.ini and configure coverage and flake runs
Scriptmode single machine training implementation
Update region in s3 boto client in serve
Update readme with instructions for 1.9.0 and above
Fix deserialization of dicts for json predict requests
Add dockerfile and update test for tensorflow 1.10.0
Support tensorflow 1.9.0
Add integ tests to verify that tensorflow in gpu-image can access gpu-devices.
train on 3 epochs for pipe mode test
Change error classes used by _default_input_fn() and _default_output_fn()
Changing assertion to check only existence
Install sagemaker-tensorflow from pypi. Add MKL environment variables for TF 1.8
get most recent saved model to export
pip install tensorflow 1.8 in 1.8 cpu image
install tensorflow extensions
upgrade cpu binaries in docker build
Force upgrade of the framework binaries to make sure the right binaries are installed.
Add Pillow to pip install list
Increase train steps for cifar distributed test to mitigate race condition
Add TensorFlow 1.8 dockerfiles
Add TensorFlow 1.7 dockerfiles
Explain how to download tf binaries from PyPI
Allow training without S3
Fix hyperparameter name for detecting a tuning job
Checkout v1.4.1 tag instead of r1.4 branch
Move processing of requirements file in.
Generate checkpoint path using TRAINING_JOB_NAME environment variable if needed
Wrap user-provided model_fn to pass arguments positionally (maintains compatibility with existing behavior)
Add more unit tests for trainer, fix all and rename train.py to avoid import conflict
Use regional endpoint for S3 client
Update README.rst
Pass input_channels to eval_input_fn if defined
Fix setup.py to refer to renamed README
Add test and build instructions
Fix year in license headers
Add TensorFlow 1.6
Add test instructions in README
Add container support to install_requires
Add Apache license headers
Use wget to install tensorflow-model-server
Fix file path for integ test
Fix s3_prefix path in integ test
Fix typo in path for integ test
Add input_channels to train_input_fn interface.
Update logging and make serving_input_fn optional.
remove pip install in tensorflow training
Modify integration tests to run nvidia-docker for gpu
add h5py for keras models
Add local integ tests & resources
Restructure repo to use a directory per TF version for dockerfiles
Rename "feature_map" variables to "feature_dict" to avoid overloading it with the ML term "feature map"
Copying in changes from internal repo:
Add functional test
Fix FROM image names for final build dockerfiles
Add dockerfiles for building our production images (TF 1.4)
GPU Dockerfile and setup.py fixes
Add base image Dockerfiles for 1.4
Merge pull request #1 from aws/mvs-first-commit
first commit
Updating initial README.md from template
Creating initial file from template
Creating initial file from template
Creating initial file from template
Creating initial file from template
Creating initial file from template
Initial commit

v0.1.0 (2019-05-22)

Bug fixes and other changes

skip setup on second remote run
add setup file back
add branch name to remote gpu test run command
remove setup file in release build gpu test
ignore coverage in release build tests
use tar file name as framework_support_installable in build_all.py
Add release build
Explicitly set lower-bound for botocore version
Pull request to test codebuild trigger on TensorFlow script mode
Update integ test for checking Python version
Upgrade to TensorFlow 1.13.1
Add mpi4py to pip installs
Add SageMaker integ test for hyperparameter tuning model_dir logic
Add Horovod benchmark
Fix model_dir adjustment for hyperparameter tuning jobs
change model_dir to training job name if it is for tuning.
Tune test_s3_plugin test
Skip the s3_plugin test before new binary released
Add model saving warning at end of training
Specify region when creating S3 resource in integ tests
Fix instance_type fixture setup for tests
Read framework version from Python SDK for integ test default
Fix SageMaker Session handling in Horovod test
Configure encoding to be utf-8
Use the test argement framework_version in all tests
Fix broken test test_distributed_mnist_no_ps
Add S3 plugin tests
Skip horovod local CPU test in GPU instances
Add Horovod tests
Skip horovod integration tests
TensorFlow 1.12 and Horovod support
Deprecate get_marker. Use get_closest_marker instead
Force parameter server to run on CPU
Add python-dev and build-essential to Dockerfiles
Update script_mode_train_any_tf_script_in_sage_maker.ipynb
Skip keras local mode test on gpu and use random port for serving in the test
Fix Keras test
Create parameter server in different thread
Add Keras support
Fix broken unit tests
Unset CUDA_VISIBLE_DEVICES for worker processes
Disable GPU for parameter process
Set parameter process waiting to False
Update sagemaker containers
GPU fix
Set S3 environment variables
Add CI configuration files
Add distributed training support
Edited the tf script mode notebook
Add benchmarking script
Add Script Mode example
Add integration tests to run training jobs with sagemaker
Add tox.ini and configure coverage and flake runs
Scriptmode single machine training implementation
Update region in s3 boto client in serve
Update readme with instructions for 1.9.0 and above
Fix deserialization of dicts for json predict requests
Add dockerfile and update test for tensorflow 1.10.0
Support tensorflow 1.9.0
Add integ tests to verify that tensorflow in gpu-image can access gpu-devices.
train on 3 epochs for pipe mode test
Change error classes used by _default_input_fn() and _default_output_fn()
Changing assertion to check only existence
Install sagemaker-tensorflow from pypi. Add MKL environment variables for TF 1.8
get most recent saved model to export
pip install tensorflow 1.8 in 1.8 cpu image
install tensorflow extensions
upgrade cpu binaries in docker build
Force upgrade of the framework binaries to make sure the right binaries are installed.
Add Pillow to pip install list
Increase train steps for cifar distributed test to mitigate race condition
Add TensorFlow 1.8 dockerfiles
Add TensorFlow 1.7 dockerfiles
Explain how to download tf binaries from PyPI
Allow training without S3
Fix hyperparameter name for detecting a tuning job
Checkout v1.4.1 tag instead of r1.4 branch
Move processing of requirements file in.
Generate checkpoint path using TRAINING_JOB_NAME environment variable if needed
Wrap user-provided model_fn to pass arguments positionally (maintains compatibility with existing behavior)
Add more unit tests for trainer, fix all and rename train.py to avoid import conflict
Use regional endpoint for S3 client
Update README.rst
Pass input_channels to eval_input_fn if defined
Fix setup.py to refer to renamed README
Add test and build instructions
Fix year in license headers
Add TensorFlow 1.6
Add test instructions in README
Add container support to install_requires
Add Apache license headers
Use wget to install tensorflow-model-server
Fix file path for integ test
Fix s3_prefix path in integ test
Fix typo in path for integ test
Add input_channels to train_input_fn interface.
Update logging and make serving_input_fn optional.
remove pip install in tensorflow training
Modify integration tests to run nvidia-docker for gpu
add h5py for keras models
Add local integ tests & resources
Restructure repo to use a directory per TF version for dockerfiles
Rename "feature_map" variables to "feature_dict" to avoid overloading it with the ML term "feature map"
Copying in changes from internal repo:
Add functional test
Fix FROM image names for final build dockerfiles
Add dockerfiles for building our production images (TF 1.4)
GPU Dockerfile and setup.py fixes
Add base image Dockerfiles for 1.4
Merge pull request #1 from aws/mvs-first-commit
first commit
Updating initial README.md from template
Creating initial file from template
Creating initial file from template
Creating initial file from template
Creating initial file from template
Creating initial file from template
Initial commit

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

v20.4.1 (2022-12-06)

Bug Fixes and Other Changes

v20.4.0 (2022-07-08)

Features

v20.3.1 (2022-06-04)

Bug Fixes and Other Changes

v20.3.0 (2020-12-17)

Features

Bug Fixes and Other Changes

Testing and Release Infrastructure

v20.2.0 (2020-12-11)

Features

v20.1.5 (2020-12-08)

Bug Fixes and Other Changes

v20.1.4 (2020-11-06)

Bug Fixes and Other Changes

v20.1.3 (2020-10-19)

Bug Fixes and Other Changes

v20.1.2 (2020-08-27)

Bug Fixes and Other Changes

v20.1.1 (2020-08-23)

Bug Fixes and Other Changes

v20.1.0 (2020-08-06)

Features

v20.0.1.post2 (2020-07-02)

Testing and Release Infrastructure

v20.0.1.post1 (2020-06-18)

Testing and Release Infrastructure

v20.0.1.post0 (2020-06-15)

Documentation Changes

Testing and Release Infrastructure

v20.0.1 (2020-06-10)

Bug Fixes and Other Changes

v20.0.0.post0 (2020-05-18)

Documentation Changes

v4.0.1 (2020-05-13)

Bug Fixes and Other Changes

v4.0.0 (2020-05-09)

Breaking Changes

Features

Testing and Release Infrastructure

v3.2.3.post1 (2020-04-30)

Testing and Release Infrastructure

v3.2.3.post0 (2020-04-27)

Documentation Changes

v3.2.3 (2020-04-16)

Bug Fixes and Other Changes

v3.2.2 (2020-04-08)

Bug Fixes and Other Changes

v3.2.1 (2020-04-07)

Bug Fixes and Other Changes

v3.2.0 (2020-04-02)

Features

Bug Fixes and Other Changes

v3.1.8 (2020-04-01)

Bug Fixes and Other Changes

v3.1.7.post1 (2020-03-31)

Testing and Release Infrastructure

v3.1.7.post0 (2020-03-31)

Testing and Release Infrastructure

v3.1.7 (2020-03-26)

Bug Fixes and Other Changes

v3.1.6 (2020-03-16)

Bug Fixes and Other Changes

v3.1.5 (2020-03-12)

Bug Fixes and Other Changes

v3.1.4 (2020-03-11)

Bug Fixes and Other Changes

v3.1.3 (2020-03-10)

Bug Fixes and Other Changes

v3.1.2 (2020-03-04)

Bug Fixes and Other Changes

Testing and Release Infrastructure