27 Jul 23:36

9e14a47

v0.8.2

🚀 Composer v0.8.2

Composer v0.8.2 is released! Install via pip:

pip install --upgrade mosaicml==0.8.2

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.2

🐛 Bug Fixes

Fixed Notebook Progress Bars in Colab

Fixes a bug introduced by #1264 which causes Composer running in Colab notebooks to error out with:
UnsupportedOperation: fileno.

Closes #1312. Fixed in PR #1314.

Changelog

v0.8.1...v0.8.2

Assets 2

22 Jul 23:23

bandish-shah

v0.8.1

8418a67

v0.8.1

🚀 Composer v0.8.1

Composer v0.8.1 is released! Install via pip:

pip install --upgrade mosaicml==0.8.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.1

🎁 New Features

🖼️ Image Visualizer

The ImageVisualizer callback periodically logs the training and validation images when using the WandB logger. This is great for validating your dataloader pipeline, especially if extensive data augmentations are used. Also, when training on a semantic segmentation task, the callback can log the target segmentation mask and the predicted segmentation mask by setting the argument mode='segmentation'. See PR #1266 for more details. Here is an example of using the ImageVisualizer callback:
```
from composer import Trainer
from composer.callbacks import ImageVisualizer

# Callback to log 8 training images after every 100 batches
image_visualizer = ImageVisualizer()

# Construct trainer
trainer = Trainer(
    ...,
    callbacks=image_visualizer
)

# Train!
trainer.fit()
```
Here is an example visualization from the training set of ADE20k:
📶 TensorBoard Logging

You can now log metrics and losses from your Composer training runs with Tensorboard! See #1250 and #1283 for more details. All you have to do is create a TensorboardLogger object and add it
to the list of loggers in your Trainer object like so:
```
from composer import Trainer
from composer.loggers import TensorboardLogger

tb_logger = TensorboardLogger(log_dir="./my_tensorboard_logs")

trainer = Trainer(
    ...
    # Add your Tensorboard Logger to the trainer here.
    loggers=[tb_logger],
)

trainer.fit()
```
For more information, see this tutorial.
🔙 Multiple Losses

Adds support for multiple losses. If a model returns a tuple of losses, they are summed before the loss.backward() call. See #1240 for more details.

🌎️ Stream Datasets from HTTP URIs

You can now specify a HTTP URI for a Streaming Dataset remote. See #1258 for more detials. For example:

from composer.datasets.streaming import StreamingDataset
from torch.utils.data import DataLoader

# Construct the Dataset
dataset = StreamingDataset(
    ...,
    remote="https://example.com/dataset/",
)

# Construct the DataLoader
train_dl = DataLoader(dataset)

# Construct the Trainer
trainer = Trainer(
    ...,
    train_dataloader=train_dl,
)

# Train!
trainer.fit()

For more information on streaming datasets, see this tutorial.

🏄️ GPU Devices default to TF32 Matmuls

Beginning with PyTorch 1.12, the default behavior for computing FP32 matrix multiplies on NVIDIA Ampere devices was switched from TF32 to FP32. See PyTorch documentation here.

Since Composer is designed specifically for ML training with a focus on efficiency, we choose to preserve the old default of using TF32 on Ampere devices. This leads to significantly higher throughput when training in single precision, without impact training convergence. See PR #1275 for implementation details.

👋 Set the Device ID for GPU Devices

Specify the device ID within a DeviceGPU to train on when instantiating a Trainer object instead of using the local ID! For example,

from composer.trainer.devices.device_gpu import DeviceGPU

# Specify to use GPU 3 to train 
device = DeviceGPU(device_id=3)

# Construct the Trainer
trainer = Trainer(
    ...,
    device = device
)

# Train!
trainer.fit()

BERT and C4 Updates

We make some minor adjustments to our bert-base-uncased.yaml training config. In particular, we make the global train and eval batch sizes a power of 2. This maintains divisibility when using many GPUs in multi-node training. We also adjust the max_duration so that it converts cleanly to 70,000 batches.

We also upgrade our StreamingDataset C4 conversion script (scripts/mds/c4.py) to use a multi-threaded reader. On a 64-core machine we are able to convert the 770GB train split to .mds format in ~1.5hr.
📂 Set a prefix when using a S3ObjectStore

When using S3ObjectStore for applications like checkpointing, it can be useful to provide path prefixes, mimicking folder/subfolder directories like on a local filesystem. When prefix is provided, any objects uploaded with S3ObjectStore will be stored at f's3://{self.bucket}/{self.prefix}{object_name}'.
⚖️ Scale the Warmup Period of Composer Schedulers

Added a new flag scale_warmup to schedulers that will scale the warmup period when a scale schedule ratio is applied. Default is False to mirror default behavior. See #1268 for more detials.
🧊 Stochastic Depth on Residual Blocks

Residual blocks are detected automatically and replaced with stochastic versions. See #1253 for more details.

🐛 Bug Fixes

Fixed Progress Bars

Fixed a bug where the the Progress Bars jumped around and did not stream properly when tailing the terminal over the network. Fixed in #1264, #1287, and #1289.
Fixed S3ObjectStore in Multithreaded Environments

Fixed a bug where the boto3 crashed when creating the default session in multiple threads simultaniously (see boto/boto3#1592). Fixed in #1260.
Retry on ChannelException errors in the SFTPObjectStore

Catch ChannelException SFTP transient error and retry. Fixed in #1245.
Treating S3 Permission Denied Errors as Not Found Errors

We update our handling of botocore 403 ClientErrors to interpret them as FileNotFoundErrors. We do this because of a situation that occurs when a user has no S3 credentials configured, and tries to read from a bucket with public files. For privacy, Amazon S3 raises 403 (Permission Denied) instead of 404 (Not Found) errors. As such, PR #1249 treats 403 ClientErrors as FileNotFoundErrors.
Fixed Parsing of grad_accum in the TrainerHparams

Fixes an error where the command line override --grad_accum lead to incorrect parsing. Fixed in #1256.
Fixed Example YAML Files

Our recipe configurations (YAML) are updated to the latest version, and a test was added to enforce correctness moving forward. Fixed in #1235 and #1257.

Changelog

v0.8.0...v0.8.1

Assets 2

01 Jul 04:15

ravi-mosaicml

v0.8.0

80d3293

v0.8.0

🚀 Composer v0.8.0

Composer v0.8.0 is released! Install via pip:

pip install --upgrade mosaicml==0.8.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.0

New Features

🤗 HuggingFace ComposerModel

Train your HuggingFace models with Composer! We introduced a HuggingFaceModel that converts your existing 🤗 Transformers models into a ComposerModel.

For example:

import transformers
from composer.models import HuggingFaceModel

# Define the model
hf_model = transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Convert it into a ComposerModel
model = HuggingFaceModel(hf_model)

# Construct the trainer
trainer = Trainer(
    ...,
    model,
)

# Train!
trainer.fit()

For more information, see the example on fine-tuning a pretrained BERT with Composer.

🫕 Fused Layer Norm

Fused LayerNorm replaces implementations of torch.nn.LayerNorm with a apex.normalization.fused_layer_norm. The fused kernel provides increased GPU utilization.

For example:

from composer.trainer import Trainer
from composer.algorithms import FusedLayerNorm

# Initialize the algorithm
alg = FusedLayerNorm()

# Construct the trainer
trainer = Trainer(
    algorithms=alg,
)

# Train!
trainer.fit()

See the method card for more information.

💾 Ignore Checkpoint Parameters

If you have a checkpoint and don't want to restore some elements of the chceckpoint to the state, we added a load_ignore_keys parameter. Any specified (nested) keys will be ignored. Glob syntax is supported!

For example, to restore a checkpoint without the seed:
```
from composer import Trainer

trainer = Trainer(
    ...,
    load_path="path/to/my/checkpoint.pt",
    load_ignore_keys=["state/rank_zero_seed", "rng"],
)
```
See the Trainer API Reference for more information.

🪣 Object Stores

Composer v0.8.0 introduces an abstract Object Store API to support multiple object store drivers, such as boto3 (for Amazon S3) and Paramiko (for SFTP), in addition to the existing libcloud implementation.

For example, if you are training on AWS where credentials are available in the environment, here's how to to save checkpoints to a S3 object store via Boto3.

from composer import Trainer
from composer.loggers import ObjectStoreLogger
from composer.utils.object_store import S3ObjectStore

logger = ObjectStoreLogger(
    object_store_cls=S3ObjectStore,
    object_store_kwargs={
        # These arguments will be passed into the S3ObjectStore -- e.g.:
        # object_store = S3ObjectStore(**object_store_kwargs)
        # Refer to the S3ObjectStore class for documentation
        'bucket': 'my-bucket',
    },
)

trainer = Trainer(
    ...,
    loggers=logger,
)

# Train!
trainer.fit()

See the Object Store API Reference for more information.

🪨 Artifact Metadata

Composer automatically logs the epoch, batch, sample, and token counts as metadata when storing artifacts in Weights & Biases. See the API Reference for more information.

API Changes

✂️ Gradient Clipping is now an Algorithm

To clean up the Trainer, we moved gradient clipping into an Algorithm. The grad_clip_norm argument in the Trainer is deprecated and will be removed in a future version of Composer. Instead, use the Gradient Clipping algorithm:

For example:
```
from composer.algorithms import GradientClipping
from composer.trainer import Trainer

# Configure gradient clipping
gradient_clipping = GradientClipping()

# Configure the trainer
trainer = Trainer(
    ...,
    algorithms=gradient_clipping,
)

# Train!
trainer.fit()
```
See the method card for more information.
🕒️ Removed batch_num_samples and batch_num_tokens from the state.

State properties batch_num_samples and batch_num_tokens have been removed.
Instead, use State.timestamp for token and sample tracking.
🧑‍🤝‍🧑 DDP Sync Strategy

We changed the default DDP Sync Strategy to MULTI_AUTO_SYNC, as FORCED_SYNC doesn't work with all algorithms.
🏃 Moved the run_name into the State

The run_name has been added to the State object, so it is persisted with checkpoints. It has been removed from the Logger.

Bug Fixes

In the Object Store Logger, added in retries for credential validation, and validating credentials only on global rank zero. (#1144)
Fixed a bug in the speed monitor where it returned negative wall clock times. (#1123)
Fixed how block-wise Stochastic Depth could freeze the trainer. (#1087)
Fixed a bug in the [MLPerfCallback] where sample counts were incorrect on per-sharded datasets. (#1156)

Changelog

v0.7.1...v0.8.0

Assets 2

07 Jun 00:21

ravi-mosaicml

v0.7.1

20aad7e

v0.7.1

🚀 Composer v0.7.1

Composer v0.7.1 is released! Install via pip:

pip install --upgrade mosaicml==0.7.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.7.1

Bug Fixes

Upgraded wandb>=0.12.17, to fix incompatibility with protobuf >= 4 (wandb/wandb#3709)

Changelog

v0.7.0...v0.7.1

Assets 2

24 May 00:56

ravi-mosaicml

v0.7.0

0193c44

v0.7.0

🚀 Composer v0.7.0

Composer v0.7.0 is released! Install via pip:

pip install --upgrade mosaicml==0.7.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.7.0

New Features

🏎️ FFCV Integration

Composer supports FFCV, a fast dataloader for image datasets. We've found FFCV can speed up ResNet-56 training by 16%, in addition to existing speed-ups already supported by Composer! It's easy to use FFCV with any existing image dataset:

import ffcv
from ffcv.fields.decoders import IntDecoder, SimpleRGBImageDecoder
from torchvision.datasets import ImageFolder

from composer import Trainer
from composer.datasets.ffcv_utils import write_ffcv_dataset, ffcv_monkey_patches

# Convert the dataset to FFCV format
# This step needs to be done only once per dataset
dataset = ImageFolder(...)
ffcv_dataset_path = "my_ffcv_dataset.ffcv"
write_ffcv_dataset(dataset=dataset, write_path=ffcv_dataset_path)

# In FFCV v0.0.3, len(dataloader) is expensive. Fix that via a monkeypatch
ffcv_monkey_patches()

# Construct the train dataloader
train_dl = ffcv.Loader(
    ffcv_dataset_path,
    ...
)

# Construct the trainer
trainer = Trainer(
    train_dataloader=train_dl,
)

# Train using FFCV!
trainer.fit()

See our notebook on training with FFCV for a full example.

✅ Autoresume from Checkpoints

When setting autoresume=True, Composer can automatically resume from an existing checkpoint before starting a new training run. Specifically, the trainer will look in the save_folder (and any loggers that save artifacts) for the latest checkpoint; if none is found, then it'll start from the beginning.

This feature does not require a different entrypoint to distinguish between starting a new training run or automatically resuming from an existing one, making it easy to use Composer on spot preemptable cloud instances. Simply set autoresume=True, point the instance to your training script, and Composer will handle the rest!
```
from composer import Trainer

# When using `autoresume`, it is required to specify the
# `run_name`, so Composer will know which training run to
# resume
run_name = "my_autoresume_training_run"

trainer = Trainer(
    ...,
    run_name=run_name,
    # specify where to save checkpoints
    save_folder="./my_autoresume_training_run",
    autoresume=True,
)

# Train! Composer will handle loading an existing
# checkpoint or starting a new training run
trainer.fit()
```
See the Trainer API Reference for more information.

♻️ Reuse the Trainer

Want to train on multiple dataloaders sequentially? Each trainer object now supports multiple calls to Trainer.fit(), so you can continue training an existing model on a new dataloader, with new schedulers, all while using the same model and trainer object.

For example:

from torch.utils.data import DataLoader

from composer import Trainer

train_dl_1 = DataLoader(...)
trainer = Trainer(
    model=model,
    max_duration='5ep',
    train_dataloader=train_dl_1,
)

# Train once!
trainer.fit()

# Train again with a new dataloader for another 5 epochs
train_dl_2 = DataLoader(...)
trainer.fit(
    train_dataloader=train_dl_2,
    duration='5ep',
)

See the Trainer API Reference for more information.

⚖️ Eval or Predict Only? No Problem

You can evaluate or predict on an existing model, without having to supply a train dataloader or training duration argument -- they're now optional.

import torchmetrics
from torch.utils.data import DataLoader

from composer import Trainer

# Construct the trainer
trainer = Trainer(model=model)

# Evaluate!
eval_dl = DataLoader(...)
trainer.eval(
    dataloader=eval_dl,
    metrics=torchmetrics.Accuracy(),
)

# Examine evaluation metrics
print("Eval metrics", trainer.state.metrics['eval'])

# Or, predict!
predict_dl = DataLoader(...)
trainer.predict(dataloader=predict_dl)

See the Trainer API Reference for more information.

🛑 Early Stopper and Threshold Stopper Callbacks

The Early Stopper and Threshold Stopper callbacks end training early when the target metrics are met:

from composer.callbacks.early_stopper import EarlyStopper
from torchmetrics.classification.accuracy import Accuracy

# Construct the callback
early_stopper = EarlyStopper(
    monitor="Accuracy",
    dataloader_label="eval",
    patience=2,
)

# Construct the trainer
trainer = Trainer(
    ...,
    callbacks=early_stopper,
    max_duration="100ep",
)

# Train!
# Training will end early if the accuracy does not improve
# over two epochs
trainer.fit()

🪵 Load Checkpoints from Loggers

It's now possible to restore checkpoints from loggers that support file artifacts (such as the Weights & Baises Logger). No need to download your checkpoints manually anymore.

from composer import Trainer
from composer.loggers import WandBLogger

# Configure the W&B Logger
wandb_logger = WandBLogger(
    # set to True to capture artifacts, like checkpoints
    log_artifacts=True,
    init_params={
        'project': 'my-wandb-project-name',
    },
)

# Then, to train and save checkpoints to W&B:
trainer = Trainer(
    ...,
    loggers=wandb_logger,
    save_folder="/tmp/checkpoints",
    save_interval="1ep",
    save_artifact_name="epoch{epoch}.pt",
)

# Finally, to load checkpoints from W&B
trainer = Trainer(
    ...,
    load_object_store=wandb_logger,
    load_path="epoch1.pt:latest",
)

⌛ Wall Clock, Evaluation, and Prediction Time Tracking

The timestamp object measures wall clock time via three new fields: total_wct, epoch_wct, and batch_wct. These fields track the total elapsed training time, the elapsed training time of the current epoch, and the time to train the last batch. Read the wall clock time via a callback:
```
from composer import Callback, Trainer

class MyCallback(Callback):
    def batch_end(self, state, event):
        print(f"Total wct: {state.timetsamp.total_wct}")
        print(f"Epoch wct: {state.timetsamp.epoch_wct}")
        print(f"Batch wct: {state.timetsamp.batch_wct}")

# Construct the trainer with this callback
trainer = Trainer(
    ...,
    callbacks=MyCallback(),
)

# Train!
trainer.fit()
```
In addition, the training state object has two new fields for tracking time during evaluation and prediction: eval_timestamp and predict_timestamp. These fields, just like any others on the state object, are accessible to algorithms, callbacks, and loggers.
Training DeepLabv3+ on the ADE20k Dataset

DeepLabv3+ is a common baseline model for semantic segmentation tasks. We provide a ComposerModel implementation for DeepLabv3+ built using torchvision and mmsegmentation for the backbone and head, respectively.

We found the DeepLabv3+ baseline can be significantly improved using the new PyTorch pre-trained weights. Additional gains are made through a hyperparameter sweep.

We benchmark our DeepLabv3+ model on a single 8xA100 machine using ADE20k, a popular semantic segmentation dataset. The final results on ADE20k are:

Model mIoU Time-to-Train

Unoptimized DeepLabv3+ 44.17 +/-...

Model	mIoU	Time-to-Train
Unoptimized DeepLabv3+	44.17 +/-...

Contributors

pavithranrao, ofirpress, and QiyaoWei

Assets 2

06 May 02:25

ravi-mosaicml

v0.6.1

83d96b7

v0.6.1

🚀 Composer v0.6.1

Composer v0.6.1 is released!

Go ahead and upgrade; it's fully backwards compatible with Composer v0.6.0.

Install via pip:

pip install --upgrade mosaicml==0.6.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.6.1

What's New?

📎 Adaptive Gradient Clipping (AGC)

Adaptive Gradient Clipping (AGC) clips gradients based on the ratio of their norms with weights' norms. This technique helps stabilize training with large batch sizes, especially for models without batchnorm layers.
🚚 Exponential Moving Average (EMA)

Exponential Moving Average (EMA) is a model averaging technique that maintains an exponentially weighted moving average of the model parameters during training. The averaged parameters are used for model evaluation. EMA typically results in less noisy validation metrics over the course of training, and sometimes increased generalization.

🪵 Logger is available in the ComposerModel

The Logger is bound to the ComposerModel via the self.logger attribute. It is available during training on all methods (other than __init__).

For example, to log hidden activation:

class Net(ComposerModel):

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        if self.logger:
            self.logger.data_batch({
                "hidden_activation_norm": x.norm(2).item(),
            })
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x)

🐛 Environment Collection Script

Composer v0.6.1 includes an environment collection script which generates a printout of your system configuration and python environment. If you run into a bug, the results from this script will help us debug the issue and fix Composer.

To collect your environment information:
```
$ pip install mosaicml  # if composer is not already installed
$ composer_collect_env
```
Then, include the output in your GitHub Issue.

What's Improved?

📜 TorchScriptable Algorithms

BlurPool, Ghost BatchNorm, and Stochastic Depth are now TorchScript-compatible. Try exporting your models with these algorithms enabled!
🏛️ ColOut on Segmentation

ColOut now supports segmentation-style models.

What's Fixed?

🚑️ Loggers capture the Traceback

We fixed a bug so the Loggers, such as the Weights & Biases Logger and the File Logger, will capture the traceback any exception that crashes the training process.
🏋️ Weights & Biases Logger Config

We fixed a bug where the the Weights & Biases Logger was not properly recording the configuration.

Full Changelog

v0.6.0...v0.6.1

Assets 2

21 Apr 01:49

ravi-mosaicml

v0.6.0

4574dce

v0.6.0

🚀 Composer v0.6.0

Composer v0.6.0 is released! Install via pip:

pip install --upgrade mosaicml==0.6.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.6.0

Major Changes

🗃️ Automatic Gradient Accumulation

Composer v0.6.0 can automatically pick an appropriate value for gradient accumulation. The trainer will automatically catch
OutOfMemory exceptions and handle them gracefully. No need to manually tune this parameter for each model, batch size, and
hardware combination!

To use automatic gradient accumulation, set grad_accum='auto'. For example:
```
trainer = Trainer(
    ...,
    grad_accum='auto',
)
```
💾 Artifact Logging

Training on spot instances? Composer v0.6.0 introduces artifact logging, making it possible to store checkpoints and other artifacts directly to cloud storage. See the Object Store Logger and the Checkpointing Guide for more information.

Artifact Logging has replaced the run directory and the run directory uploader, which have been removed.
📊 Metric Values on the State

Composer v0.6.0 binds the computed metric values on the State. Go ahead and read these values from your own callbacks! We'll be releasing an early stopping callback in an upcoming Composer release.
⚠️ NoEffectWarning and NotIntendedUseWarning for Algorithms

Some algorithms, such as BlurPool, now emit a NoEffectWarning or a NotIntendedUseWarning when they're not being used appropriately.

Minor Improvements

🏃‍♀️ Training Run Names

We introduced a run_name parameter in the Trainer to help organize training runs.
```
trainer = Trainer(
    ...,
    run_name='awesome-traing-run',
)
```
We'll automatically pick one if the run name is not specified.
💈 Automatic Progress Bars

The ProgressBarLogger, formally called the TQDMLogger, is automatically enabled for all training runs.

To disable the progress bar, set progress_bar=False. For example:
```
trainer = Trainer(
    ...,
    progress_bar=False,
)
```
🪵 Logged Data in the Console

To print Logger calls to the console, set the log_to_console and the console_log_level arguments.
```
trainer = Trainer(
    ...,
    log_to_console=True,
    console_log_level="epoch",
)
```
By default, the console logger will only be enabled when progress_bar=False. The default console log level is epoch.
📃 Capturing stdout and stderr in Log Files

The FileLogger captures stdout and stderr by default now. Tracebacks will now be captured amongst other logging statements.
⬆️ PyTorch 1.11 Support

We've tested Composer on PyTorch 1.11. Go ahead and upgrade your dependencies!
✅ Checkpointing

We changed the checkpoint format to store the underlying model, not the DistributedDataParallel wrapped model. If you're using Composer to read checkpoints, there's nothing to change. But if you're reading Composer checkpoints manually, note that the module checkpoints will be formatted differently.

In addition, we changed the checkpointing argument names for the trainer.
- The new parameters save_artifact_name and save_latest_artifact_name allow checkpoints to be saved directly to artifact stores.
- The new parameter save_num_checkpoints_to_keep helps preserve local disk storage by automatically removing old checkpoints.
- load_path replaces load_path_format.
- save_name replaces save_path_format.
- save_latest_filename replaces save_latest_format.
🏎️ Profiling

We added support for custom scheduling functions and re-designed how the profiler saves traces. Each profiling cycle will now have its own trace file. Trace merging happens automatically throughout the training process. Long-running profiling is now possible without the long wait at the end of training for the trace merge.

As part of this refactor, the profiler arguments have changed:
- prof_trace_handlers replaces prof_event_handlers.
- prof_schedule replaces prof_skip_first, prof_wait, prof_warmup, prof_active, and prof_repeat. See the cyclic schedule function.
- torch_prof_folder replaces torch_profiler_trace_dir
- The new arguments torch_prof_filename, torch_prof_artifact_name, torch_prof_overwrite, and torch_prof_num_traces_to_keep allow for customization on how PyTorch Profiler traces are saved.
🏗️ TorchVision Model Architectures

We switched our vision models to use the TorchVision model architecture implementations where possible.

Bug Fixes

Fixed a bug with MixUp and gradient accumulation
Fixed numerous issues with the Composer launch script for distributed training. Composer v0.6.0 includes environment variable support, better defaults and warings, and proper handling of crashed processes.

Changelog

Update Migrating_from_PTL.ipynb by @moinnadeem in #730
CodeQL Analysis by @Averylamp in #723
Installing pyright via npm by @ravi-mosaicml in #735
Polish intro docs by @dblalock in #721
Numerics docs page by @bandish-shah in #725
Testing Niklas GH Docs Star w/ Dark Mode by @moinnadeem in #742
[Artifact Logging PR1] Logger Refactoring by @ravi-mosaicml in #698
Update README.md by @moinnadeem in #731
Updated the Method Cards by @hanlint in #647
Using existing clone in conda meta.yaml by @ravi-mosaicml in #751
[Artifact Logging PR2] Logger Destination Cleanup by @ravi-mosaicml in #699
Shorten to minimal code snippets by @hanlint in #752
Sample-wise Stochastic Depth Method Card by @Landanjs in #749
Update algorithm yamls by @coryMosaicML in #747
[Artifact Logging PR3] Add the run_name as a property of the Logger by @ravi-mosaicml in #700
[Artifact Logging PR4] Added log_file_artifact base method by @ravi-mosaicml in #701
Fix README.md by @ravi-mosaicml in #753
Less CodeQL by @Averylamp in #762
Increase the timeout for test trainer equivalence by @ravi-mosaicml in #766
Port squeze excite method card to new format by @dblalock in #764
Small fixes by @hanlint in #765
Adding defaults to blurpool by @moinnadeem in #756
Added maximum versions to dependencies by @ravi-mosaicml in #768
Update sequence length warmup documentation by @moinnadeem in #770
Additional README fixes by @hanlint in #769
Fix setup.py by @Averylamp in #761
Increased the timeout for test_trainer.py by @ravi-mosaicml in #775
Remove plural types and aliases for native pytorch types by @Landanjs in #677
[Artifact Logging PR5] Added the object store logger by @ravi-mosaicml in #706
[Artifact Logging PR6] Rename the TQDMLogger as the ProgressBarLogger; remove terminal logging from the file logger by @ravi-mosaicml in #708
[Artifact Logging PR7] Add stdout and stderr capture to the FileLogger by @ravi-mosaicml in #710
Update README.md by @vahidfazelrezai in #781
URGENT: Fixing an incorrect number by @jfrankle in https:/...

Contributors

kobindra, moinnadeem, and 23 other contributors

Assets 2

16 Mar 14:02

hanlint

v0.5.0

00e51ba

Release version v0.5.0

We are excited to share Composer v0.5, a library of speed-up methods for efficient neural network training. This release features:

Revamped checkpointing API based on community feedback
New baselines: ResNet34-SSD, GPT-3, and Vision Transformers
Additional improvements to our documentation
Support for bfloat16
Streaming dataset support
Unified functional API for our algorithms

Highlights

Checkpointing API

Checkpointing models are now a Callback, so that users can easily write and add their own callbacks. The callback is automatically appended if a save_folder is provided to the Trainer.

trainer = Trainer(
    model=model,
    algorithms=algorithms,
    save_folder="checkpoints",
    save_interval="1ep"
)

Alternatively, CheckpointSaver can be directly added as a callback:

trainer = Trainer(..., callbacks=[
    CheckpointSaver(
        save_folder='checkpoints',
        name_format="ep{epoch}-ba{batch}/rank_{rank}",
        save_latest_format="latest/rank_{rank}",
        save_interval="1ep",
        weights_only=False,
    )
])

Subclass from CheckpointSaver to add your own logic for saving the best model, or saving at specific intervals. Thanks to @mansheej @siriuslee and other users for their feedback.

bloat16

We've added experimental support for bfloat16, which can be provided via the precision argument to the Trainer:

trainer = Trainer(
    ...,
    precision="bfloat16"
)

Streaming datasets

We've added support for fast streaming datasets. For NLP-based datasets such as C4, we use the HuggingFace datasets backend, and add dataset-specific shuffling, tokenization , and grouping on-the-fly. To support data parallel training, we added specific sharding logic for efficiency. See C4Datasets for more details.

Vision streaming datasets are supported via a patched version of the webdatasets package, and added support for data sharding by workers for fast augmentations. See composer.datasets.webdataset for more details.

Baseline GPT-3, ResNet34-SSD, and Vision Transformer benchmarks

Configurations for GPT-3-like models ranging from 125m to 760m parameters are now released, and use DeepSpeed Zero Stage 0 for memory-efficient training.

We've also added the Single Shot Detection (SSD) model (Wei et al, 2016) with a ResNet34 backbone, based on the MLPerf reference implementation.

Our first Vision Transformer benchmark is the ViT-S/16 model from Touvron et al, 2021, and based on the vit-pytorch package.

See below for the full details:

What's Changed

Export Transforms in composer.algorithms by @ajaysaini725 in #603
Make batchnorm default for UNet by @dskhudia in #535
Fix no_op_model algorithm by @dskhudia in #614
Pin pre-1.0 packages by @bandish-shah in #595
Updated dark mode composer logo, and graph by @nqn in #617
Jenkins + Docker Improvements by @ravi-mosaicml in #621
update README links by @hanlint in #628
Remove all old timing calls by @ravi-mosaicml in #594
Remove state shorthand by @mvpatel2000 in #629
add bfloat16 support by @nikhilsardana in #433
v0.4.0 Hotfix: Docker documentation updates by @bandish-shah in #631
Fix wrong icons in the method cards by @hanlint in #636
fix autocast for pytorch < 1.10 by @nikhilsardana in #639
Add tutorial notebooks to the README by @moinnadeem in #630
Converted Stateless Schedulers to Classes by @ravi-mosaicml in #632
Jenkinsfile Fixes Part 2 by @ravi-mosaicml in #627
Add C4 Streaming dataset by @abhi-mosaic in #489
CONTRIBUTING.md additions by @kobindra in #648
Hide showing object as a base class; fix skipping documentation of forward; fixed docutils dependency. by @ravi-mosaicml in #643
Matthew/functional docstrings update by @growlix in #622
docstrings improvements for core modules by @dskhudia in #598
ssd-resnet34 on COCO map 0.23 by @florescl in #646
Fix broken "best practices" link by @growlix in #649
Update progressive resizing to work for semantic segmentation by @coryMosaicML in #604
Let C4 Dataset overwrite num_workers if set incorrectly by @abhi-mosaic in #655
Lazy imports for pycocotools by @abhi-mosaic in #656
W&B excludes final eval metrics when plotted as a fxn of epoch or trainer/global_step by @growlix in #633
Update GPT3-yamls for default 8xA100-40GB by @abhi-mosaic in #663
Set WandB default to log rank zero only by @abhi-mosaic in #461
Update schedulers guide by @hanlint in #661
[XS] Fix a TQDM deserialization bug by @jbloxham in #665
Add defaults to the docstrings for algorithms by @hanlint in #662
Fix ZeRO config by @jbloxham in #667
[XS] fix formatting for colout by @hanlint in #666
Composer.core docstring touch-up by @ravi-mosaicml in #657
Add Uniform bounding box sampling option for CutOut and CutMix by @coryMosaicML in #634
Update README.md by @ravi-mosaicml in #678
Fix bug in trainer test by @hanlint in #651
InMemoryLogger has get_timeseries() method by @growlix in #644
Batchwise resolution for SWA by @growlix in #654
Fixed the conda build script so it runs on jenkins by @ravi-mosaicml in #676
Yahp version update to 0.1.0 by @Averylamp in #674
Streaming vision datasets by @knighton in #284
Fix DeepSpeed checkpointing by @jbloxham in #686
Vit by @A-Jacobson in #243
[S] cleanup tldr; standardize __all__ by @hanlint in #688
Unify algorithms part 2: mixup, cutmix, label smoothing by @dblalock in #658
composer.optim docstrings by @jbloxham in #653
Fix DatasetHparams, WebDatasetHparams docstring by @growlix in #697
Models docstrings by @A-Jacobson in #469
docstrings improvements for composer.datasets by @dskhudia in #694
Updated contributing.md and the style guide by @ravi-mosaicml in #670
Ability to retry ADE20k crop transform by @Landanjs in #702
Add mmsegmentation DeepLabv3(+) by @Landanjs in #684
Unify functional API part 3 by @dblalock in #715
Update example notebooks by @coryMosaicML in #707
[Checkpointing - PR1] Store the rank_zero_seed on state by @ravi-mosaicml in #680
[Checkpointing - PR2] Added in new Checkpointing Events by @ravi-mosaicml in #690
[Checkpointing - PR3] Clean up RNG and State serialization by @ravi-mosaicml in #692
[Checkpointing - PR4] Refactored the CheckpointLoader into a load_checkpoint function by @ravi-mosaicml in #693
Update {blurpool,factorize,ghostbn} method cards by @dblalock in #711
[Checkpointing - PR 5] Move the CheckpointSaver to a callback. by @ravi-mosaicml in #687
Update datasets docstrings by @growlix in #709
add notebooks and functional api by @hanlint in #714
Migrating from PTL notebook by @florescl in #436
Docs 0.4.1: Profiler section and tutorials by @bandish-shah in https://github.com/mos...

Contributors

kobindra, moinnadeem, and 20 other contributors

Assets 2

01 Mar 02:34

hanlint

v0.4.0

7714b13

Release Version 0.4.0

What's Changed

Release/0.3.0 by @ravi-mosaicml in #102
Create dataloader on trainer init() by @ravi-mosaicml in #92
label smoothing will not work without alpha set by @A-Jacobson in #100
Warmup and cosine annealing warm restarts combine sequentially by @jacobfulano in #99
Moved device.prepare() to init by @ravi-mosaicml in #111
run_event for callbacks, removed deferred logging by @ravi-mosaicml in #85
Remove composer.trainer.ddp; replace with composer.utils.ddp by @ravi-mosaicml in #105
Running callbacks befor algorithms for the INIT event in the engine by @ravi-mosaicml in #113
Replaced atexit with cleanup methods by @ravi-mosaicml in #112
Deepspeed Integration by @jbloxham in #109
Fix loss reporting by @jbloxham in #130
Run Directory Uploader by @ravi-mosaicml in #101
Dataloader Upgrades by @ravi-mosaicml in #114
Synthetic Datasets and Subset Sampling by @ravi-mosaicml in #110
Remove argparse from setup.py by @ravi-mosaicml in #131
Fixed pickling of torch.memory_format objects by @ravi-mosaicml in #132
Fixed issue #135; rename total_batch_size to train_batch_size by @ravi-mosaicml in #137
Implement MosaicMLLoggerBackend by @ajaysaini725 in #81
Add a linear learning rate decay by @moinnadeem in #142
Apply channels last on init by @ravi-mosaicml in #147
Update Trainer checkpointing documentation by @moinnadeem in #150
Address crashes with DDP + Checkpointing by @moinnadeem in #151
Sudo in the dockerimage by @ravi-mosaicml in #152
Remove curriculum learning by @ravi-mosaicml in #164
Remove broken symlinks by @ravi-mosaicml in #163
Removed dataclass from state by @ravi-mosaicml in #153
Guard artifact uploading in wandb with ddp barriers by @ravi-mosaicml in #162
add CODE_OF_CONDUCT.md by @kobindra in #160
[XS] Fix wandb logger by @jbloxham in #172
Print help on run_mosaic_trainer.py, cleaned up verbosity. by @ravi-mosaicml in #170
DeepSpeed ZeRO config options by @jbloxham in #166
DDP Seeding Across Processes by @ajaysaini725 in #173
Fixed the run directory uploader test by @ravi-mosaicml in #177
Fix broken gpu tests by @ravi-mosaicml in #181
Conditionally skip tests when installed with mosaicml[dev] by @ravi-mosaicml in #185
A yapf update broke some formatting...re-running the linter by @ravi-mosaicml in #188
Timer PR parts 1 and 2 from #146 by @ravi-mosaicml in #174
Fixed pyright issues by @ravi-mosaicml in #198
Additional Tests by @ravi-mosaicml in #191
Propagate processes that were sigkilled by @ravi-mosaicml in #184
Add the ability to load a checkpoint without restoring state by @moinnadeem in #169
Add ResNet-9 for CIFAR-10 by @dblalock in #193
Added helper methods for torch.distributed.boradcast by @ravi-mosaicml in #189
Checkpointing & DeepSpeed by @jbloxham in #199
Distinguish between dist and DDP by @jbloxham in #201
DeepSpeed precision fixes for CV by @jbloxham in #197
Fix deterministic mode (and use it for tests); simplify checkpointing tests by @ravi-mosaicml in #203
Load checkpoints from cloud storage by @ravirahman in #200
Updated the DataSpec for the timing abstraction (#146) parts 3 and 4 by @ravi-mosaicml in #178
Add larger GPT models by @jbloxham in #213
Add BERT Base to Composer by @moinnadeem in #195
Integrate the timer into the training loop by @ravi-mosaicml in #210
Dockerfile enhancements by @ravi-mosaicml in #182
Adding checkpointing at the end of training by @moinnadeem in #219
Adding conditional branching on data_collator by @moinnadeem in #220
Fixes apt sources bug fix by @Averylamp in #231
Remove old timing calls from layer freezing by @ravi-mosaicml in #216
Require pip install -e be pip install --user -e when running as root by @ravi-mosaicml in #232
DeepLabv3 + ADE20k benchmark by @Landanjs in #107
Remove old timing calls from selective backprop by @ravi-mosaicml in #221
Clean up the tests to make them work on jenkins by @ravi-mosaicml in #233
Make the run directory rank-local; fix checkpoints saving and restoring by @ravi-mosaicml in #215
Cleaned Up State by @ravi-mosaicml in #223
Fix the speed monitor by @ravi-mosaicml in #238
Fixed loggers and callbacks by @ravi-mosaicml in #240
Fix ade20k padding fill calculation by @Landanjs in #250
Adding fix for NLP learning rates by @moinnadeem in #235
Training Loop Profiler by @ravi-mosaicml in #97
WIP: Composer Jenkinsfile by @ravi-mosaicml in #82
Fix broken tests by @ravi-mosaicml in #257
Fix bug with AFTER_DATALOADER event; remove microbatches from state by @ravi-mosaicml in #258
Remove the DDP DataLoader by @ravi-mosaicml in #245
Fix Jenkins to work on PRs from Forks by @ravi-mosaicml in #267
add ability to specify custom run name, with rank auto-appended by @dblalock in #264
Remove secrets from the yaml by @ravi-mosaicml in #261
Checkpoint logging and doc fixes by @ajaysaini725 in #270
Remove custom W&B config changes by @siriuslee in #236
Dramatically increase default dist_timeout by @jbloxham in #272
Add factorization by @dblalock in #53
Allow str and dict in Trainer init signature by @hanlint in #277
Add kwargs back to the closure by @jbloxham in #292
Default to num_classes=10 for CIFAR10_ResNet56 by @hanlint in #293
Use tqdm.auto for notebooks by @hanlint in #298
Added ResNet20 by @growlix in #289
Optimizer Surgery by @ravi-mosaicml in #249
Don't init dist when world_size is 1 by @jbloxham in #311
Scheduler defaults to step-wise instead of epoch-wise by @hanlint in #312
Added the version to composer.init by @ravi-mosaicml in #315
Rename checkpoint API by @hanlint in #281
Update setup.py by @Averylamp in #321
Timm support by @A-Jacobson in #262
[XS] use correct package name in error messages by @jbloxham in #331
Multiple Evaluator Datasets by @anisehsani in #120
Fixed all uses of textwrap.dedent by @ravi-mosaicml in #332
Remove explicit YAHP constructs from algorithms by @jbloxham in https://github.com/mosaicml/composer/pu...

Contributors

kobindra, moinnadeem, and 21 other contributors

Assets 2

01 Dec 00:27

Averylamp

v0.3.1

d17e69f

Release Version 0.3.1

Hotfix

Hotfix to fix installation of the composer package

Assets 2

Releases: mosaicml/composer

v0.8.2

🚀 Composer v0.8.2

🐛 Bug Fixes

Changelog

v0.8.1

🚀 Composer v0.8.1

🎁 New Features

🐛 Bug Fixes

Changelog

v0.8.0

🚀 Composer v0.8.0

New Features

API Changes

Bug Fixes

Changelog

v0.7.1

🚀 Composer v0.7.1

Bug Fixes

Changelog

v0.7.0

🚀 Composer v0.7.0

New Features

Contributors

v0.6.1

🚀 Composer v0.6.1

What's New?

What's Improved?

What's Fixed?

Full Changelog

v0.6.0

🚀 Composer v0.6.0

Major Changes

Minor Improvements

Bug Fixes

Changelog

Contributors

Release version v0.5.0

Highlights

Checkpointing API

bloat16

Streaming datasets

Baseline GPT-3, ResNet34-SSD, and Vision Transformer benchmarks

What's Changed

Contributors

Release Version 0.4.0

What's Changed

Contributors

Release Version 0.3.1

Hotfix