Releases: mosaicml/composer
v0.8.2
🚀 Composer v0.8.2
Composer v0.8.2 is released! Install via pip
:
pip install --upgrade mosaicml==0.8.2
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.8.2
🐛 Bug Fixes
-
Fixed Notebook Progress Bars in Colab
Fixes a bug introduced by #1264 which causes Composer running in Colab notebooks to error out with:
UnsupportedOperation: fileno.
Changelog
v0.8.1
🚀 Composer v0.8.1
Composer v0.8.1 is released! Install via pip
:
pip install --upgrade mosaicml==0.8.1
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.8.1
🎁 New Features
-
🖼️ Image Visualizer
The
ImageVisualizer
callback periodically logs the training and validation images when using the WandB logger. This is great for validating your dataloader pipeline, especially if extensive data augmentations are used. Also, when training on a semantic segmentation task, the callback can log the target segmentation mask and the predicted segmentation mask by setting the argumentmode='segmentation'
. See PR #1266 for more details. Here is an example of using theImageVisualizer
callback:from composer import Trainer from composer.callbacks import ImageVisualizer # Callback to log 8 training images after every 100 batches image_visualizer = ImageVisualizer() # Construct trainer trainer = Trainer( ..., callbacks=image_visualizer ) # Train! trainer.fit()
Here is an example visualization from the training set of ADE20k:
-
📶 TensorBoard Logging
You can now log metrics and losses from your Composer training runs with Tensorboard! See #1250 and #1283 for more details. All you have to do is create a
TensorboardLogger
object and add it
to the list of loggers in yourTrainer
object like so:from composer import Trainer from composer.loggers import TensorboardLogger tb_logger = TensorboardLogger(log_dir="./my_tensorboard_logs") trainer = Trainer( ... # Add your Tensorboard Logger to the trainer here. loggers=[tb_logger], ) trainer.fit()
For more information, see this tutorial.
-
🔙 Multiple Losses
Adds support for multiple losses. If a model returns a tuple of losses, they are summed before the
loss.backward()
call. See #1240 for more details. -
🌎️ Stream Datasets from HTTP URIs
You can now specify a HTTP URI for a Streaming Dataset remote. See #1258 for more detials. For example:
from composer.datasets.streaming import StreamingDataset from torch.utils.data import DataLoader # Construct the Dataset dataset = StreamingDataset( ..., remote="https://example.com/dataset/", ) # Construct the DataLoader train_dl = DataLoader(dataset) # Construct the Trainer trainer = Trainer( ..., train_dataloader=train_dl, ) # Train! trainer.fit()
For more information on streaming datasets, see this tutorial.
-
🏄️ GPU Devices default to TF32 Matmuls
Beginning with PyTorch 1.12, the default behavior for computing FP32 matrix multiplies on NVIDIA Ampere devices was switched from TF32 to FP32. See PyTorch documentation here.
Since Composer is designed specifically for ML training with a focus on efficiency, we choose to preserve the old default of using TF32 on Ampere devices. This leads to significantly higher throughput when training in single precision, without impact training convergence. See PR #1275 for implementation details.
-
👋 Set the Device ID for GPU Devices
Specify the device ID within a DeviceGPU to train on when instantiating a Trainer object instead of using the local ID! For example,
from composer.trainer.devices.device_gpu import DeviceGPU # Specify to use GPU 3 to train device = DeviceGPU(device_id=3) # Construct the Trainer trainer = Trainer( ..., device = device ) # Train! trainer.fit()
-
BERT and C4 Updates
We make some minor adjustments to our
bert-base-uncased.yaml
training config. In particular, we make the global train and eval batch sizes a power of 2. This maintains divisibility when using many GPUs in multi-node training. We also adjust themax_duration
so that it converts cleanly to 70,000 batches.We also upgrade our StreamingDataset C4 conversion script (
scripts/mds/c4.py
) to use a multi-threaded reader. On a 64-core machine we are able to convert the 770GB train split to.mds
format in ~1.5hr. -
📂 Set a
prefix
when using aS3ObjectStore
When using
S3ObjectStore
for applications like checkpointing, it can be useful to provide path prefixes, mimickingfolder/subfolder
directories like on a local filesystem. Whenprefix
is provided, any objects uploaded withS3ObjectStore
will be stored atf's3://{self.bucket}/{self.prefix}{object_name}'
. -
⚖️ Scale the Warmup Period of Composer Schedulers
Added a new flag
scale_warmup
to schedulers that will scale the warmup period when a scale schedule ratio is applied. Default isFalse
to mirror default behavior. See #1268 for more detials. -
🧊 Stochastic Depth on Residual Blocks
Residual blocks are detected automatically and replaced with stochastic versions. See #1253 for more details.
🐛 Bug Fixes
-
Fixed Progress Bars
Fixed a bug where the the Progress Bars jumped around and did not stream properly when tailing the terminal over the network. Fixed in #1264, #1287, and #1289.
-
Fixed S3ObjectStore in Multithreaded Environments
Fixed a bug where the
boto3
crashed when creating the default session in multiple threads simultaniously (see boto/boto3#1592). Fixed in #1260. -
Retry on
ChannelException
errors in theSFTPObjectStore
Catch
ChannelException
SFTP transient error and retry. Fixed in #1245. -
Treating S3 Permission Denied Errors as Not Found Errors
We update our handling of
botocore
403 ClientErrors to interpret them asFileNotFoundErrors
. We do this because of a situation that occurs when a user has no S3 credentials configured, and tries to read from a bucket with public files. For privacy, Amazon S3 raises 403 (Permission Denied) instead of 404 (Not Found) errors. As such, PR #1249 treats 403 ClientErrors as FileNotFoundErrors. -
Fixed Parsing of
grad_accum
in theTrainerHparams
Fixes an error where the command line override
--grad_accum
lead to incorrect parsing. Fixed in #1256. -
Fixed Example YAML Files
Our recipe configurations (YAML) are updated to the latest version, and a test was added to enforce correctness moving forward. Fixed in #1235 and #1257.
Changelog
v0.8.0
🚀 Composer v0.8.0
Composer v0.8.0 is released! Install via pip
:
pip install --upgrade mosaicml==0.8.0
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.8.0
New Features
-
🤗 HuggingFace ComposerModel
Train your HuggingFace models with Composer! We introduced a
HuggingFaceModel
that converts your existing 🤗 Transformers models into a ComposerModel.For example:
import transformers from composer.models import HuggingFaceModel # Define the model hf_model = transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) # Convert it into a ComposerModel model = HuggingFaceModel(hf_model) # Construct the trainer trainer = Trainer( ..., model, ) # Train! trainer.fit()
For more information, see the example on fine-tuning a pretrained BERT with Composer.
-
🫕 Fused Layer Norm
Fused LayerNorm replaces implementations of
torch.nn.LayerNorm
with aapex.normalization.fused_layer_norm
. The fused kernel provides increased GPU utilization.For example:
from composer.trainer import Trainer from composer.algorithms import FusedLayerNorm # Initialize the algorithm alg = FusedLayerNorm() # Construct the trainer trainer = Trainer( algorithms=alg, ) # Train! trainer.fit()
See the method card for more information.
-
💾 Ignore Checkpoint Parameters
If you have a checkpoint and don't want to restore some elements of the chceckpoint to the state, we added a
load_ignore_keys
parameter. Any specified (nested) keys will be ignored. Glob syntax is supported!For example, to restore a checkpoint without the seed:
from composer import Trainer trainer = Trainer( ..., load_path="path/to/my/checkpoint.pt", load_ignore_keys=["state/rank_zero_seed", "rng"], )
See the Trainer API Reference for more information.
-
🪣 Object Stores
Composer v0.8.0 introduces an abstract Object Store API to support multiple object store drivers, such as boto3 (for Amazon S3) and Paramiko (for SFTP), in addition to the existing libcloud implementation.
For example, if you are training on AWS where credentials are available in the environment, here's how to to save checkpoints to a S3 object store via Boto3.
from composer import Trainer from composer.loggers import ObjectStoreLogger from composer.utils.object_store import S3ObjectStore logger = ObjectStoreLogger( object_store_cls=S3ObjectStore, object_store_kwargs={ # These arguments will be passed into the S3ObjectStore -- e.g.: # object_store = S3ObjectStore(**object_store_kwargs) # Refer to the S3ObjectStore class for documentation 'bucket': 'my-bucket', }, ) trainer = Trainer( ..., loggers=logger, ) # Train! trainer.fit()
See the Object Store API Reference for more information.
-
🪨 Artifact Metadata
Composer automatically logs the epoch, batch, sample, and token counts as metadata when storing artifacts in Weights & Biases. See the API Reference for more information.
API Changes
-
✂️ Gradient Clipping is now an Algorithm
To clean up the Trainer, we moved gradient clipping into an Algorithm. The
grad_clip_norm
argument in the Trainer is deprecated and will be removed in a future version of Composer. Instead, use the Gradient Clipping algorithm:For example:
from composer.algorithms import GradientClipping from composer.trainer import Trainer # Configure gradient clipping gradient_clipping = GradientClipping() # Configure the trainer trainer = Trainer( ..., algorithms=gradient_clipping, ) # Train! trainer.fit()
See the method card for more information.
-
🕒️ Removed
batch_num_samples
andbatch_num_tokens
from the state.State properties
batch_num_samples
andbatch_num_tokens
have been removed.
Instead, useState.timestamp
for token and sample tracking. -
🧑🤝🧑 DDP Sync Strategy
We changed the default DDP Sync Strategy to
MULTI_AUTO_SYNC
, asFORCED_SYNC
doesn't work with all algorithms. -
🏃 Moved the
run_name
into theState
The
run_name
has been added to the State object, so it is persisted with checkpoints. It has been removed from the Logger.
Bug Fixes
- In the Object Store Logger, added in retries for credential validation, and validating credentials only on global rank zero. (#1144)
- Fixed a bug in the speed monitor where it returned negative wall clock times. (#1123)
- Fixed how block-wise Stochastic Depth could freeze the trainer. (#1087)
- Fixed a bug in the [MLPerfCallback] where sample counts were incorrect on per-sharded datasets. (#1156)
Changelog
v0.7.1
🚀 Composer v0.7.1
Composer v0.7.1 is released! Install via pip
:
pip install --upgrade mosaicml==0.7.1
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.7.1
Bug Fixes
- Upgraded
wandb>=0.12.17
, to fix incompatibility with protobuf >= 4 (wandb/wandb#3709)
Changelog
v0.7.0
🚀 Composer v0.7.0
Composer v0.7.0 is released! Install via pip
:
pip install --upgrade mosaicml==0.7.0
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.7.0
New Features
-
🏎️ FFCV Integration
Composer supports FFCV, a fast dataloader for image datasets. We've found FFCV can speed up ResNet-56 training by 16%, in addition to existing speed-ups already supported by Composer! It's easy to use FFCV with any existing image dataset:
import ffcv from ffcv.fields.decoders import IntDecoder, SimpleRGBImageDecoder from torchvision.datasets import ImageFolder from composer import Trainer from composer.datasets.ffcv_utils import write_ffcv_dataset, ffcv_monkey_patches # Convert the dataset to FFCV format # This step needs to be done only once per dataset dataset = ImageFolder(...) ffcv_dataset_path = "my_ffcv_dataset.ffcv" write_ffcv_dataset(dataset=dataset, write_path=ffcv_dataset_path) # In FFCV v0.0.3, len(dataloader) is expensive. Fix that via a monkeypatch ffcv_monkey_patches() # Construct the train dataloader train_dl = ffcv.Loader( ffcv_dataset_path, ... ) # Construct the trainer trainer = Trainer( train_dataloader=train_dl, ) # Train using FFCV! trainer.fit()
See our notebook on training with FFCV for a full example.
-
✅ Autoresume from Checkpoints
When setting
autoresume=True
, Composer can automatically resume from an existing checkpoint before starting a new training run. Specifically, the trainer will look in thesave_folder
(and any loggers that save artifacts) for the latest checkpoint; if none is found, then it'll start from the beginning.This feature does not require a different entrypoint to distinguish between starting a new training run or automatically resuming from an existing one, making it easy to use Composer on spot preemptable cloud instances. Simply set
autoresume=True
, point the instance to your training script, and Composer will handle the rest!from composer import Trainer # When using `autoresume`, it is required to specify the # `run_name`, so Composer will know which training run to # resume run_name = "my_autoresume_training_run" trainer = Trainer( ..., run_name=run_name, # specify where to save checkpoints save_folder="./my_autoresume_training_run", autoresume=True, ) # Train! Composer will handle loading an existing # checkpoint or starting a new training run trainer.fit()
See the Trainer API Reference for more information.
-
♻️ Reuse the Trainer
Want to train on multiple dataloaders sequentially? Each trainer object now supports multiple calls to
Trainer.fit()
, so you can continue training an existing model on a new dataloader, with new schedulers, all while using the same model and trainer object.For example:
from torch.utils.data import DataLoader from composer import Trainer train_dl_1 = DataLoader(...) trainer = Trainer( model=model, max_duration='5ep', train_dataloader=train_dl_1, ) # Train once! trainer.fit() # Train again with a new dataloader for another 5 epochs train_dl_2 = DataLoader(...) trainer.fit( train_dataloader=train_dl_2, duration='5ep', )
See the Trainer API Reference for more information.
-
⚖️ Eval or Predict Only? No Problem
You can evaluate or predict on an existing model, without having to supply a train dataloader or training duration argument -- they're now optional.
import torchmetrics from torch.utils.data import DataLoader from composer import Trainer # Construct the trainer trainer = Trainer(model=model) # Evaluate! eval_dl = DataLoader(...) trainer.eval( dataloader=eval_dl, metrics=torchmetrics.Accuracy(), ) # Examine evaluation metrics print("Eval metrics", trainer.state.metrics['eval']) # Or, predict! predict_dl = DataLoader(...) trainer.predict(dataloader=predict_dl)
See the Trainer API Reference for more information.
-
🛑 Early Stopper and Threshold Stopper Callbacks
The Early Stopper and Threshold Stopper callbacks end training early when the target metrics are met:
from composer.callbacks.early_stopper import EarlyStopper from torchmetrics.classification.accuracy import Accuracy # Construct the callback early_stopper = EarlyStopper( monitor="Accuracy", dataloader_label="eval", patience=2, ) # Construct the trainer trainer = Trainer( ..., callbacks=early_stopper, max_duration="100ep", ) # Train! # Training will end early if the accuracy does not improve # over two epochs trainer.fit()
-
🪵 Load Checkpoints from Loggers
It's now possible to restore checkpoints from loggers that support file artifacts (such as the Weights & Baises Logger). No need to download your checkpoints manually anymore.
from composer import Trainer from composer.loggers import WandBLogger # Configure the W&B Logger wandb_logger = WandBLogger( # set to True to capture artifacts, like checkpoints log_artifacts=True, init_params={ 'project': 'my-wandb-project-name', }, ) # Then, to train and save checkpoints to W&B: trainer = Trainer( ..., loggers=wandb_logger, save_folder="/tmp/checkpoints", save_interval="1ep", save_artifact_name="epoch{epoch}.pt", ) # Finally, to load checkpoints from W&B trainer = Trainer( ..., load_object_store=wandb_logger, load_path="epoch1.pt:latest", )
-
⌛ Wall Clock, Evaluation, and Prediction Time Tracking
The timestamp object measures wall clock time via three new fields:
total_wct
,epoch_wct
, andbatch_wct
. These fields track the total elapsed training time, the elapsed training time of the current epoch, and the time to train the last batch. Read the wall clock time via a callback:from composer import Callback, Trainer class MyCallback(Callback): def batch_end(self, state, event): print(f"Total wct: {state.timetsamp.total_wct}") print(f"Epoch wct: {state.timetsamp.epoch_wct}") print(f"Batch wct: {state.timetsamp.batch_wct}") # Construct the trainer with this callback trainer = Trainer( ..., callbacks=MyCallback(), ) # Train! trainer.fit()
In addition, the training state object has two new fields for tracking time during evaluation and prediction:
eval_timestamp
andpredict_timestamp
. These fields, just like any others on the state object, are accessible to algorithms, callbacks, and loggers. -
Training DeepLabv3+ on the ADE20k Dataset
DeepLabv3+ is a common baseline model for semantic segmentation tasks. We provide a
ComposerModel
implementation for DeepLabv3+ built using torchvision and mmsegmentation for the backbone and head, respectively.We found the DeepLabv3+ baseline can be significantly improved using the new PyTorch pre-trained weights. Additional gains are made through a hyperparameter sweep.
We benchmark our DeepLabv3+ model on a single 8xA100 machine using ADE20k, a popular semantic segmentation dataset. The final results on ADE20k are:
Model mIoU Time-to-Train Unoptimized DeepLabv3+ 44.17 +/-...
v0.6.1
🚀 Composer v0.6.1
Composer v0.6.1 is released!
Go ahead and upgrade; it's fully backwards compatible with Composer v0.6.0.
Install via pip
:
pip install --upgrade mosaicml==0.6.1
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.6.1
What's New?
-
📎 Adaptive Gradient Clipping (AGC)
Adaptive Gradient Clipping (AGC) clips gradients based on the ratio of their norms with weights' norms. This technique helps stabilize training with large batch sizes, especially for models without batchnorm layers.
-
🚚 Exponential Moving Average (EMA)
Exponential Moving Average (EMA) is a model averaging technique that maintains an exponentially weighted moving average of the model parameters during training. The averaged parameters are used for model evaluation. EMA typically results in less noisy validation metrics over the course of training, and sometimes increased generalization.
-
🪵 Logger is available in the ComposerModel
The Logger is bound to the ComposerModel via the
self.logger
attribute. It is available during training on all methods (other than__init__
).For example, to log hidden activation:
class Net(ComposerModel): def forward(self, x): x = F.relu(F.max_pool2d(self.conv1(x), 2)) x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) if self.logger: self.logger.data_batch({ "hidden_activation_norm": x.norm(2).item(), }) x = x.view(-1, 320) x = F.relu(self.fc1(x)) x = F.dropout(x, training=self.training) x = self.fc2(x) return F.log_softmax(x)
-
🐛 Environment Collection Script
Composer v0.6.1 includes an environment collection script which generates a printout of your system configuration and python environment. If you run into a bug, the results from this script will help us debug the issue and fix Composer.
To collect your environment information:
$ pip install mosaicml # if composer is not already installed $ composer_collect_env
Then, include the output in your GitHub Issue.
What's Improved?
-
📜 TorchScriptable Algorithms
BlurPool, Ghost BatchNorm, and Stochastic Depth are now TorchScript-compatible. Try exporting your models with these algorithms enabled!
-
🏛️ ColOut on Segmentation
ColOut now supports segmentation-style models.
What's Fixed?
-
🚑️ Loggers capture the Traceback
We fixed a bug so the Loggers, such as the Weights & Biases Logger and the File Logger, will capture the traceback any exception that crashes the training process.
-
🏋️ Weights & Biases Logger Config
We fixed a bug where the the Weights & Biases Logger was not properly recording the configuration.
Full Changelog
v0.6.0
🚀 Composer v0.6.0
Composer v0.6.0 is released! Install via pip
:
pip install --upgrade mosaicml==0.6.0
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.6.0
Major Changes
-
🗃️ Automatic Gradient Accumulation
Composer v0.6.0 can automatically pick an appropriate value for gradient accumulation. The trainer will automatically catch
OutOfMemory exceptions and handle them gracefully. No need to manually tune this parameter for each model, batch size, and
hardware combination!To use automatic gradient accumulation, set
grad_accum='auto'
. For example:trainer = Trainer( ..., grad_accum='auto', )
-
💾 Artifact Logging
Training on spot instances? Composer v0.6.0 introduces artifact logging, making it possible to store checkpoints and other artifacts directly to cloud storage. See the Object Store Logger and the Checkpointing Guide for more information.
Artifact Logging has replaced the run directory and the run directory uploader, which have been removed.
-
📊 Metric Values on the State
Composer v0.6.0 binds the computed metric values on the State. Go ahead and read these values from your own callbacks! We'll be releasing an early stopping callback in an upcoming Composer release.
-
⚠️ NoEffectWarning
andNotIntendedUseWarning
for AlgorithmsSome algorithms, such as BlurPool, now emit a
NoEffectWarning
or aNotIntendedUseWarning
when they're not being used appropriately.
Minor Improvements
-
🏃♀️ Training Run Names
We introduced a
run_name
parameter in the Trainer to help organize training runs.trainer = Trainer( ..., run_name='awesome-traing-run', )
We'll automatically pick one if the run name is not specified.
-
💈 Automatic Progress Bars
The ProgressBarLogger, formally called the TQDMLogger, is automatically enabled for all training runs.
To disable the progress bar, set
progress_bar=False
. For example:trainer = Trainer( ..., progress_bar=False, )
-
🪵 Logged Data in the Console
To print Logger calls to the console, set the
log_to_console
and theconsole_log_level
arguments.trainer = Trainer( ..., log_to_console=True, console_log_level="epoch", )
By default, the console logger will only be enabled when
progress_bar=False
. The default console log level isepoch
. -
📃 Capturing
stdout
andstderr
in Log FilesThe FileLogger captures
stdout
andstderr
by default now. Tracebacks will now be captured amongst other logging statements. -
⬆️ PyTorch 1.11 Support
We've tested Composer on PyTorch 1.11. Go ahead and upgrade your dependencies!
-
✅ Checkpointing
We changed the checkpoint format to store the underlying model, not the DistributedDataParallel wrapped model. If you're using Composer to read checkpoints, there's nothing to change. But if you're reading Composer checkpoints manually, note that the module checkpoints will be formatted differently.
In addition, we changed the checkpointing argument names for the trainer.
- The new parameters
save_artifact_name
andsave_latest_artifact_name
allow checkpoints to be saved directly to artifact stores. - The new parameter
save_num_checkpoints_to_keep
helps preserve local disk storage by automatically removing old checkpoints. load_path
replacesload_path_format
.save_name
replacessave_path_format
.save_latest_filename
replacessave_latest_format
.
- The new parameters
-
🏎️ Profiling
We added support for custom scheduling functions and re-designed how the profiler saves traces. Each profiling cycle will now have its own trace file. Trace merging happens automatically throughout the training process. Long-running profiling is now possible without the long wait at the end of training for the trace merge.
As part of this refactor, the profiler arguments have changed:
prof_trace_handlers
replacesprof_event_handlers
.prof_schedule
replacesprof_skip_first
,prof_wait
,prof_warmup
,prof_active
, andprof_repeat
. See the cyclic schedule function.torch_prof_folder
replacestorch_profiler_trace_dir
- The new arguments
torch_prof_filename
,torch_prof_artifact_name
,torch_prof_overwrite
, andtorch_prof_num_traces_to_keep
allow for customization on how PyTorch Profiler traces are saved.
-
🏗️ TorchVision Model Architectures
We switched our vision models to use the TorchVision model architecture implementations where possible.
Bug Fixes
- Fixed a bug with MixUp and gradient accumulation
- Fixed numerous issues with the Composer launch script for distributed training. Composer v0.6.0 includes environment variable support, better defaults and warings, and proper handling of crashed processes.
Changelog
- Update Migrating_from_PTL.ipynb by @moinnadeem in #730
- CodeQL Analysis by @Averylamp in #723
- Installing pyright via npm by @ravi-mosaicml in #735
- Polish intro docs by @dblalock in #721
- Numerics docs page by @bandish-shah in #725
- Testing Niklas GH Docs Star w/ Dark Mode by @moinnadeem in #742
- [Artifact Logging PR1] Logger Refactoring by @ravi-mosaicml in #698
- Update README.md by @moinnadeem in #731
- Updated the Method Cards by @hanlint in #647
- Using existing clone in conda meta.yaml by @ravi-mosaicml in #751
- [Artifact Logging PR2] Logger Destination Cleanup by @ravi-mosaicml in #699
- Shorten to minimal code snippets by @hanlint in #752
- Sample-wise Stochastic Depth Method Card by @Landanjs in #749
- Update algorithm yamls by @coryMosaicML in #747
- [Artifact Logging PR3] Add the
run_name
as a property of theLogger
by @ravi-mosaicml in #700 - [Artifact Logging PR4] Added log_file_artifact base method by @ravi-mosaicml in #701
- Fix README.md by @ravi-mosaicml in #753
- Less CodeQL by @Averylamp in #762
- Increase the timeout for test trainer equivalence by @ravi-mosaicml in #766
- Port squeze excite method card to new format by @dblalock in #764
- Small fixes by @hanlint in #765
- Adding defaults to blurpool by @moinnadeem in #756
- Added maximum versions to dependencies by @ravi-mosaicml in #768
- Update sequence length warmup documentation by @moinnadeem in #770
- Additional README fixes by @hanlint in #769
- Fix
setup.py
by @Averylamp in #761 - Increased the timeout for
test_trainer.py
by @ravi-mosaicml in #775 - Remove plural types and aliases for native pytorch types by @Landanjs in #677
- [Artifact Logging PR5] Added the object store logger by @ravi-mosaicml in #706
- [Artifact Logging PR6] Rename the
TQDMLogger
as theProgressBarLogger
; remove terminal logging from the file logger by @ravi-mosaicml in #708 - [Artifact Logging PR7] Add
stdout
andstderr
capture to the FileLogger by @ravi-mosaicml in #710 - Update README.md by @vahidfazelrezai in #781
- URGENT: Fixing an incorrect number by @jfrankle in https:/...
Release version v0.5.0
We are excited to share Composer v0.5, a library of speed-up methods for efficient neural network training. This release features:
- Revamped checkpointing API based on community feedback
- New baselines: ResNet34-SSD, GPT-3, and Vision Transformers
- Additional improvements to our documentation
- Support for
bfloat16
- Streaming dataset support
- Unified functional API for our algorithms
Highlights
Checkpointing API
Checkpointing models are now a Callback, so that users can easily write and add their own callbacks. The callback is automatically appended if a save_folder
is provided to the Trainer.
trainer = Trainer(
model=model,
algorithms=algorithms,
save_folder="checkpoints",
save_interval="1ep"
)
Alternatively, CheckpointSaver
can be directly added as a callback:
trainer = Trainer(..., callbacks=[
CheckpointSaver(
save_folder='checkpoints',
name_format="ep{epoch}-ba{batch}/rank_{rank}",
save_latest_format="latest/rank_{rank}",
save_interval="1ep",
weights_only=False,
)
])
Subclass from CheckpointSaver
to add your own logic for saving the best model, or saving at specific intervals. Thanks to @mansheej @siriuslee and other users for their feedback.
bloat16
We've added experimental support for bfloat16
, which can be provided via the precision
argument to the Trainer:
trainer = Trainer(
...,
precision="bfloat16"
)
Streaming datasets
We've added support for fast streaming datasets. For NLP-based datasets such as C4, we use the HuggingFace datasets backend, and add dataset-specific shuffling, tokenization , and grouping on-the-fly. To support data parallel training, we added specific sharding logic for efficiency. See C4Datasets
for more details.
Vision streaming datasets are supported via a patched version of the webdatasets
package, and added support for data sharding by workers for fast augmentations. See composer.datasets.webdataset
for more details.
Baseline GPT-3, ResNet34-SSD, and Vision Transformer benchmarks
Configurations for GPT-3-like models ranging from 125m to 760m parameters are now released, and use DeepSpeed Zero Stage 0 for memory-efficient training.
We've also added the Single Shot Detection (SSD) model (Wei et al, 2016) with a ResNet34 backbone, based on the MLPerf reference implementation.
Our first Vision Transformer benchmark is the ViT-S/16 model from Touvron et al, 2021, and based on the vit-pytorch
package.
See below for the full details:
What's Changed
- Export Transforms in
composer.algorithms
by @ajaysaini725 in #603 - Make batchnorm default for UNet by @dskhudia in #535
- Fix no_op_model algorithm by @dskhudia in #614
- Pin pre-1.0 packages by @bandish-shah in #595
- Updated dark mode composer logo, and graph by @nqn in #617
- Jenkins + Docker Improvements by @ravi-mosaicml in #621
- update README links by @hanlint in #628
- Remove all old timing calls by @ravi-mosaicml in #594
- Remove state shorthand by @mvpatel2000 in #629
- add bfloat16 support by @nikhilsardana in #433
- v0.4.0 Hotfix: Docker documentation updates by @bandish-shah in #631
- Fix wrong icons in the method cards by @hanlint in #636
- fix autocast for pytorch < 1.10 by @nikhilsardana in #639
- Add tutorial notebooks to the README by @moinnadeem in #630
- Converted Stateless Schedulers to Classes by @ravi-mosaicml in #632
- Jenkinsfile Fixes Part 2 by @ravi-mosaicml in #627
- Add C4 Streaming dataset by @abhi-mosaic in #489
- CONTRIBUTING.md additions by @kobindra in #648
- Hide showing
object
as a base class; fix skipping documentation offorward
; fixed docutils dependency. by @ravi-mosaicml in #643 - Matthew/functional docstrings update by @growlix in #622
- docstrings improvements for core modules by @dskhudia in #598
- ssd-resnet34 on COCO map 0.23 by @florescl in #646
- Fix broken "best practices" link by @growlix in #649
- Update progressive resizing to work for semantic segmentation by @coryMosaicML in #604
- Let C4 Dataset overwrite
num_workers
if set incorrectly by @abhi-mosaic in #655 - Lazy imports for
pycocotools
by @abhi-mosaic in #656 - W&B excludes final eval metrics when plotted as a fxn of epoch or trainer/global_step by @growlix in #633
- Update GPT3-yamls for default 8xA100-40GB by @abhi-mosaic in #663
- Set WandB default to log rank zero only by @abhi-mosaic in #461
- Update schedulers guide by @hanlint in #661
- [XS] Fix a TQDM deserialization bug by @jbloxham in #665
- Add defaults to the docstrings for algorithms by @hanlint in #662
- Fix ZeRO config by @jbloxham in #667
- [XS] fix formatting for colout by @hanlint in #666
- Composer.core docstring touch-up by @ravi-mosaicml in #657
- Add Uniform bounding box sampling option for CutOut and CutMix by @coryMosaicML in #634
- Update README.md by @ravi-mosaicml in #678
- Fix bug in trainer test by @hanlint in #651
- InMemoryLogger has get_timeseries() method by @growlix in #644
- Batchwise resolution for SWA by @growlix in #654
- Fixed the conda build script so it runs on jenkins by @ravi-mosaicml in #676
- Yahp version update to 0.1.0 by @Averylamp in #674
- Streaming vision datasets by @knighton in #284
- Fix DeepSpeed checkpointing by @jbloxham in #686
- Vit by @A-Jacobson in #243
- [S] cleanup tldr; standardize
__all__
by @hanlint in #688 - Unify algorithms part 2: mixup, cutmix, label smoothing by @dblalock in #658
composer.optim
docstrings by @jbloxham in #653- Fix DatasetHparams, WebDatasetHparams docstring by @growlix in #697
- Models docstrings by @A-Jacobson in #469
- docstrings improvements for composer.datasets by @dskhudia in #694
- Updated contributing.md and the style guide by @ravi-mosaicml in #670
- Ability to retry ADE20k crop transform by @Landanjs in #702
- Add mmsegmentation DeepLabv3(+) by @Landanjs in #684
- Unify functional API part 3 by @dblalock in #715
- Update example notebooks by @coryMosaicML in #707
- [Checkpointing - PR1] Store the
rank_zero_seed
on state by @ravi-mosaicml in #680 - [Checkpointing - PR2] Added in new Checkpointing Events by @ravi-mosaicml in #690
- [Checkpointing - PR3] Clean up RNG and State serialization by @ravi-mosaicml in #692
- [Checkpointing - PR4] Refactored the
CheckpointLoader
into aload_checkpoint
function by @ravi-mosaicml in #693 - Update {blurpool,factorize,ghostbn} method cards by @dblalock in #711
- [Checkpointing - PR 5] Move the
CheckpointSaver
to a callback. by @ravi-mosaicml in #687 - Update datasets docstrings by @growlix in #709
- add notebooks and functional api by @hanlint in #714
- Migrating from PTL notebook by @florescl in #436
- Docs 0.4.1: Profiler section and tutorials by @bandish-shah in https://github.com/mos...
Release Version 0.4.0
What's Changed
- Release/0.3.0 by @ravi-mosaicml in #102
- Create dataloader on trainer init() by @ravi-mosaicml in #92
- label smoothing will not work without alpha set by @A-Jacobson in #100
- Warmup and cosine annealing warm restarts combine sequentially by @jacobfulano in #99
- Moved device.prepare() to init by @ravi-mosaicml in #111
run_event
for callbacks, removed deferred logging by @ravi-mosaicml in #85- Remove
composer.trainer.ddp
; replace withcomposer.utils.ddp
by @ravi-mosaicml in #105 - Running callbacks befor algorithms for the INIT event in the engine by @ravi-mosaicml in #113
- Replaced
atexit
with cleanup methods by @ravi-mosaicml in #112 - Deepspeed Integration by @jbloxham in #109
- Fix loss reporting by @jbloxham in #130
- Run Directory Uploader by @ravi-mosaicml in #101
- Dataloader Upgrades by @ravi-mosaicml in #114
- Synthetic Datasets and Subset Sampling by @ravi-mosaicml in #110
- Remove argparse from setup.py by @ravi-mosaicml in #131
- Fixed pickling of torch.memory_format objects by @ravi-mosaicml in #132
- Fixed issue #135; rename
total_batch_size
totrain_batch_size
by @ravi-mosaicml in #137 - Implement MosaicMLLoggerBackend by @ajaysaini725 in #81
- Add a linear learning rate decay by @moinnadeem in #142
- Apply channels last on init by @ravi-mosaicml in #147
- Update Trainer checkpointing documentation by @moinnadeem in #150
- Address crashes with DDP + Checkpointing by @moinnadeem in #151
- Sudo in the dockerimage by @ravi-mosaicml in #152
- Remove curriculum learning by @ravi-mosaicml in #164
- Remove broken symlinks by @ravi-mosaicml in #163
- Removed dataclass from state by @ravi-mosaicml in #153
- Guard artifact uploading in wandb with ddp barriers by @ravi-mosaicml in #162
- add CODE_OF_CONDUCT.md by @kobindra in #160
- [XS] Fix wandb logger by @jbloxham in #172
- Print help on
run_mosaic_trainer.py
, cleaned up verbosity. by @ravi-mosaicml in #170 - DeepSpeed ZeRO config options by @jbloxham in #166
- DDP Seeding Across Processes by @ajaysaini725 in #173
- Fixed the run directory uploader test by @ravi-mosaicml in #177
- Fix broken gpu tests by @ravi-mosaicml in #181
- Conditionally skip tests when installed with mosaicml[dev] by @ravi-mosaicml in #185
- A yapf update broke some formatting...re-running the linter by @ravi-mosaicml in #188
- Timer PR parts 1 and 2 from #146 by @ravi-mosaicml in #174
- Fixed pyright issues by @ravi-mosaicml in #198
- Additional Tests by @ravi-mosaicml in #191
- Propagate processes that were sigkilled by @ravi-mosaicml in #184
- Add the ability to load a checkpoint without restoring state by @moinnadeem in #169
- Add ResNet-9 for CIFAR-10 by @dblalock in #193
- Added helper methods for torch.distributed.boradcast by @ravi-mosaicml in #189
- Checkpointing & DeepSpeed by @jbloxham in #199
- Distinguish between
dist
and DDP by @jbloxham in #201 - DeepSpeed precision fixes for CV by @jbloxham in #197
- Fix deterministic mode (and use it for tests); simplify checkpointing tests by @ravi-mosaicml in #203
- Load checkpoints from cloud storage by @ravirahman in #200
- Updated the
DataSpec
for the timing abstraction (#146) parts 3 and 4 by @ravi-mosaicml in #178 - Add larger GPT models by @jbloxham in #213
- Add BERT Base to Composer by @moinnadeem in #195
- Integrate the timer into the training loop by @ravi-mosaicml in #210
- Dockerfile enhancements by @ravi-mosaicml in #182
- Adding checkpointing at the end of training by @moinnadeem in #219
- Adding conditional branching on data_collator by @moinnadeem in #220
- Fixes apt sources bug fix by @Averylamp in #231
- Remove old timing calls from layer freezing by @ravi-mosaicml in #216
- Require
pip install -e
bepip install --user -e
when running as root by @ravi-mosaicml in #232 - DeepLabv3 + ADE20k benchmark by @Landanjs in #107
- Remove old timing calls from selective backprop by @ravi-mosaicml in #221
- Clean up the tests to make them work on jenkins by @ravi-mosaicml in #233
- Make the run directory rank-local; fix checkpoints saving and restoring by @ravi-mosaicml in #215
- Cleaned Up State by @ravi-mosaicml in #223
- Fix the speed monitor by @ravi-mosaicml in #238
- Fixed loggers and callbacks by @ravi-mosaicml in #240
- Fix ade20k padding fill calculation by @Landanjs in #250
- Adding fix for NLP learning rates by @moinnadeem in #235
- Training Loop Profiler by @ravi-mosaicml in #97
- WIP: Composer Jenkinsfile by @ravi-mosaicml in #82
- Fix broken tests by @ravi-mosaicml in #257
- Fix bug with AFTER_DATALOADER event; remove microbatches from state by @ravi-mosaicml in #258
- Remove the DDP DataLoader by @ravi-mosaicml in #245
- Fix Jenkins to work on PRs from Forks by @ravi-mosaicml in #267
- add ability to specify custom run name, with rank auto-appended by @dblalock in #264
- Remove secrets from the yaml by @ravi-mosaicml in #261
- Checkpoint logging and doc fixes by @ajaysaini725 in #270
- Remove custom W&B config changes by @siriuslee in #236
- Dramatically increase default dist_timeout by @jbloxham in #272
- Add factorization by @dblalock in #53
- Allow
str
anddict
in Trainerinit
signature by @hanlint in #277 - Add kwargs back to the closure by @jbloxham in #292
- Default to
num_classes=10
forCIFAR10_ResNet56
by @hanlint in #293 - Use
tqdm.auto
for notebooks by @hanlint in #298 - Added ResNet20 by @growlix in #289
- Optimizer Surgery by @ravi-mosaicml in #249
- Don't init dist when world_size is 1 by @jbloxham in #311
- Scheduler defaults to step-wise instead of epoch-wise by @hanlint in #312
- Added the version to composer.init by @ravi-mosaicml in #315
- Rename checkpoint API by @hanlint in #281
- Update setup.py by @Averylamp in #321
- Timm support by @A-Jacobson in #262
- [XS] use correct package name in error messages by @jbloxham in #331
- Multiple Evaluator Datasets by @anisehsani in #120
- Fixed all uses of textwrap.dedent by @ravi-mosaicml in #332
- Remove explicit YAHP constructs from algorithms by @jbloxham in https://github.com/mosaicml/composer/pu...
Release Version 0.3.1
Hotfix
Hotfix to fix installation of the composer
package