🚀 Composer v0.9.0

Excited to share the release of Composer v0.9.0, which comes with an Inference Export API, beta support for Apple Silicon and TPU training, as well as expanded usability of NLP-related speed-up methods. This release includes 175 commits from 34 contributors, including 10 new contributors 🙌 !

pip install --upgrade mosaicml==0.9.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.9.0

New Features

📦 Export for inference APIs

Train with Composer and deploy anywhere! We have added a dedicated export API as well as an export training callback to allow you to export Composer-trained models for inference, supporting popular formats such as torchscript and ONNX.

For example, here’s how to export a model in torchscript format:

from composer.utils import export_for_inference

# Invoking export with a trained model
export_for_inference(model=model, 
                     save_format='torchscript', 
                     save_path=model_save_path)

Here’s an example of using the training callback, which automatically exports the model at the end of training to ONNX format:

from composer.callbacks import ExportForInferenceCallback

# Initializing Trainer with the export callback
callback = ExportForInferenceCallback(save_format='onnx', 
                                                                            save_path=model_save_path)
trainer = Trainer(model=model,
                                callbacks=callback,
                                train_dataloader=dataloader,
                                max_duration='10ep')

# Model will be exported at the end of training
trainer.fit()

Please see our Exporting for Inference notebook for more information.

📈 ALiBi support for BERT training

You can now use ALiBi (Attention with Linear Biases; Press et al., 2021) when training BERT models with Composer, delivering faster training and higher accuracy by leveraging shorter sequence lengths.

ALiBi improves the quality of BERT pre-training, especially when pre-training uses shorter sequence lengths than the downstream (fine-tuning) task. This allows models with ALiBi to reach higher downstream accuracy with less pre-training time.

Example of using ALiBi as an algorithm with the Composer Trainer:
```
# Create an instance of a BERT masked language model
model = composer.models.create_bert_mlm()

# Apply ALiBi (when training is initialized)
alibi = composer.algorithms.alibi(max_sequence_length=1024)

# Train with ALiBi
trainer = composer.trainer.Trainer(
    model=model,
    train_dataloader=train_dataloader,
    algorithms=[alibi]
)
trainer.fit()
```
Example using the Composer Functional API:
```
import composer.functional as cf

# Create an instance of a BERT masked language model
model = composer.models.create_bert_mlm()

# Apply ALiBi and expand the model's maximum sequence length to 1024
cf.apply_alibi(model=model, max_sequence_length=1024)
```
AliBi can also now be extended to work with custom models by registering your attention and embedding layers. Please see our ALiBi method card for more information.
🧐 Entry point for GLUE tasks pre-training and fine-tuning

You can now easily pre-train and fine-tune NLP models across all GLUE (General Language Understanding Evaluation) tasks through one simple entry point! The entry point handles model saving and loading, spawns GLUE tasks in parallel across all available GPUs, and delivers a highly efficient evaluation of model performance.

Example of launching the entrypoint:
```
# This runs pre-training followed by fine-tuning.
# --training_scheme can take either pretrain, finetune, or all depending on the task!
python run_glue_trainer.py -f glue_example.yaml --training_scheme all
```
Please see our GLUE entrypoint notebook for more information.
🤖 TPU support (in beta)

You can now use Composer to train your models on TPUs! Support is now available in Beta, and currently only supports single-core TPU training. Try it out, explore optimizations, and share your feedback and feature requests with us so we can make it better for you and for the community.

To use TPUs with Composer, simply specify a tpu device:
```
# Set device to `tpu`
trainer = composer.trainer.Trainer(
    model=model,
    train_dataloader=train_dataloader,
    max_duration=train_epochs,
    device='tpu')

# Run fit
trainer.fit()
```
Please see our Training with TPUs notebook for more information.
🍎 Apple Silicon support (beta)

Leverage Apple Silicon chips to train your models with Composer by providing the device='mps' argument:
```
trainer = Trainer(
    ...,
    device='mps'
)
```
We use the latest PyTorch MPS backend to execute the training. This requires torch version ≥1.12, and Max OSX 12.3+.

For more information on training with Apple M chips, see the PyTorch 1.12 blog and our API Reference for Composer specific details.
🚧 Contrib repository

Got a new method idea, or published a paper and want those methods to be easily accessible? We’ve created the mcontrib repository, with a lightweight process to contribute new algorithms. We’re happy to work directly with you to benchmark these methods and eventually “promote” them to Composer for use by end customers.

Please checkout the README for details on how to contribute a new algorithm. For more details on how to write speed-up methods, see our notebook on custom speed-up methods.

Additional API Changes

🔢 Passes Module

The order in which algorithms are run matters significantly during composition. With this release we refactored algorithm passes into their own passes module. Users can now register custom passes (for custom algorithms) with the Engine. Please see #1377 for more information.
🗄️ Default Checkpoint Extension

The CheckpointSaver now defaults to using the *.pt extension for checkpoint fienames. Please see #1370 for more information.
👁️ Models Refactor

Most vision models (ResNet, MNIST, ViT, EfficientNet) have been refactored from classes to a factory function. For example ComposerResNet -> composer_resnet.
```
# before
from composer.models import ComposerResNet
model = ComposerResNet(..)

from composer.models import composer_resnet  # after
model = composer_resnet(..)
```
The same refactor has been done for NLP as well, e.g. BERTModel -> create_bert_mlm and create_bert_classification.

See #1227 (vision) and #1130 (NLP) for more details.
➕ Misc API Changes
- BreakEpochException has been removed.
- state.is_model_deepspeed has been moved to composer.utils.is_model_deepspeed.
- Helper function monitored_barrier has been added to composer distributed.

Bug Fixes

Add informative error for infer batch size issues (#1401)
Fix ImagenetDatasetHparams bug (#1392), resolves #1111
Fix hparams error condition checking (#1394)
Fix AMP resumption with grad scaler (#1376)
Auto Grad Accum Cache Clearing (#1380), fixes issue reported in #1331
Fix default precision (#1369)
Fix the profiler on multi-node training (#1358), resolves #1270
Retry SFTP on Size Mismatch (#1300)
Fix scheduler edge cases (#1350), resolves #1077
Fix a race condition in the object store logger (#1328)
Fix WandB load from checkpoint (#1326)
Fix Notebook Progress Bars (#1313)

Commits

What's Changed

Fix DeepSpeed typo in docstring by @abhi-mosaic in #1188
Move grad_accum logging to every step by @coryMosaicML in #1187
Update STYLE_GUIDE with details on Documentation by @bandish-shah in #1183
ProgressBar Units by @hanlint in #1190
Added Xavier Normal initializer by @vladd-i in #1196
Updated cost figure by @nqn in #1180
Remove algorithm yamls by @hanlint in #1193
Fix the Composer Launch Script for the Composer Dockerimage; Default nproc = torch.cuda.device_count() if not specified via env by @ravi-mosaicml in #1195
Bert model card by @A-Jacobson in #1198
Add Notes on Early Stopping by @anisehsani in #1182
Stochastic depth that preserves weights by @Landanjs in #1085
Adding Gated Linear Units as an algorithm by @moinnadeem in #1192
A utility to fuse parallel linear layers in FX-traced models by @dskhudia in #1189
Build+push Composer dockerimages to mosaicml/composer_staging by @ravi-mosaicml in #1197
Fix the SFTP Object Store by @ravi-mosaicml in #1202
Bert emoji by @A-Jacobson in #1205
Adding a constant warmup scheduler by @linden-li in #1203
Fix multi-GPU conflicts when downloading torchvision datasets by @abhi-mosaic in #1201
Add caveats about automatic gradient accumulation by @hanlint in #1207
Remove the composer_train entrypoint; put it back in examples by @ravi-mosaicml in #1211
Fix Composer staging dockerimages by @ravi-mosaicml in #1210
Set SFTP Object Store Private Key Filepath from an Environ by @ravi-mosaicml in #1212
[xs] Fix progress bars in get_file by @ravi-mosaicml in #1216
Cleanup SFTP url parsing for StreamingDataset by @abhi-mosaic in #1217
Fix Symlinks on Non-Libcloud Object Stores by @ravi-mosaicml in #1209
Fix the ObjectStoreLogger with Overwrite=True by @ravi-mosaicml in #1208
Throughput metrics by @linden-li in #1215
Fix module surgery for training resumptions with optimizers that save state by @dskhudia in #1200
Update bert-base.yaml by @moinnadeem in #1219
StreamingDataset: make remote optional, attempt to prettify docstrings. by @knighton in #1220
Update vision-style StreamingDatasets to subclass VisionDataset by @ravi-mosaicml in #1223
Improve docstrings. by @knighton in #1222
shardwise zip streaming datasets by @milocress in #1177
updated mosaic logos to composer logos in docs by @ejyuen in #1221
Add COMPOSER_KNOWN_HOSTS_FILENAME for setting the sftp known hosts file environ by @ravi-mosaicml in #1224
StreamingDataset: correctly handle exceptions in child download thread. by @knighton in #1228
hot fix compression 404 by @milocress in #1229
Treat any dropped SSH/SFTP connection as a transient error by @ravi-mosaicml in #1225
refactor bert and gpt by @A-Jacobson in #1130
Hotfix for S3 FileNotFoundError by @abhi-mosaic in #1233
Fix StreamingDataset compression with multi-rank by @milocress in #1231
Refactor vision models by @Landanjs in #1227
Update resnet50_medium.yaml by @lupesko in #1235
Increase default timeout for StreamingC4 to 120s by @abhi-mosaic in #1234
Add Debug Log Statements; Fix Pyright by @hanlint in #1218
Hotfix deeplabv3 by @Landanjs in #1238
Add Tensorboard Logger by @eracah in #1194
Move the model and optimizers to the device before Event.INIT by @ravi-mosaicml in #1084
Fix bug in streaming iteration/downloading, refactor by @knighton in #1239
Support sequence of losses in backwards pass by @Landanjs in #1240
Add device_id param to DeviceGPU by @ishanashastri in #1244
Update CutMix to work with segmentation style labels by @coryMosaicML in #1230
Catching ChannelErrors on SFTP Failures by @moinnadeem in #1245
Make StreamingDataset compression file easier to write/read by @abhi-mosaic in #1246
[XS] Updating console progress_bar logger to use max_duration units by @moinnadeem in #1243
Catch botocore ClientError 403 by @abhi-mosaic in #1249
Tensorboard Notebook + Tutorial by @eracah in #1250
Fix repeated words in event.py by @isaac0804 in #1254
Make progressive resizing quieter by @coryMosaicML in #1255
fix typo in example by @xloem in #1259
Create a new boto3.Session() per S3ObjectStore instance by @ravi-mosaicml in #1260
Fix recipe yamls for v0.8, add testing by @hanlint in #1257
Automatic Stochastic depth on residual blocks by @dskhudia in #1253
Sequence length warmup update and tests by @alextrott16 in #1199
ProgressBarLogger UX Enhancements by @ravi-mosaicml in #1264
Update to latest pytorch by @mvpatel2000 in #1262
Add packaging to meta.yaml; add py-cpuinfo max version by @ravi-mosaicml in #1271
Fix Flaky Tests by @ravi-mosaicml in #1272
Add callback for visualizing image inputs and outputs by @coryMosaicML in #1266
Add scale_warmup argument to schedulers by @hanlint in #1268
Switch Jenkins to r1z3 by @ravi-mosaicml in #1277
BERT and C4 updates by @abhi-mosaic in #1252
Default to allow_tf32=True for GPU Devices by @ravi-mosaicml in #1275
Fix grad accum parsing in hparams by @hanlint in #1256
Fix issue with doctest format in some docstring examples by @Landanjs in #1269
Adds S3ObjectStore import to util init.py by @codestar12 in #1274
Add tutorial on exporting for inference by @hanlint in #1276
HTTPS downloads for streaming datasets by @ravi-mosaicml in #1258
object stores for streaming datasets by @milocress in #1248
Allow object name prefix for S3ObjectStore by @abhi-mosaic in #1278
Hotfix CO-658 by @milocress in #1273
Fix S3 remote paths for StreamingDataset download by @abhi-mosaic in #1280
Add combo loss to DeepLabv3+ by @Landanjs in #1265
Checkpoint backwards compatibility for ProgressBar by @hanlint in #1287
Add missing callbacks by @hanlint in #1286
Fix S3 prefix upload/download by @abhi-mosaic in #1288
Fix device inference in module surgery by @hanlint in #1290
Actual fix to backwards compatibility by @hanlint in #1289
Bugs in getting_started.ipynb by @rahulvigneswaran in #1285
Add pytorch 1.12.0 docker image by @linden-li in #1247
Fix TB Logger + ObjectStore quadratic complexity issue by doing 1 file per flush by @eracah in #1283
Enable README Doctests with GPUs by @mvpatel2000 in #1279
Fix logging of hparams to object stores by @ravi-mosaicml in #1297
[xs] Reformat the Composer Version String by @ravi-mosaicml in #1301
Add monitored barrier for autograd accum by @mvpatel2000 in #1295
[xs] Notebook Fixes by @ravi-mosaicml in #1299
[xs] Store the Composer version in one place. by @ravi-mosaicml in #1302
model export for inference. Functional API by @dskhudia in #1294
Add a return_outputs flag to predict() by @ravi-mosaicml in #1307
Integration Testing by @ravi-mosaicml in #1305
Fix get_file_artifact in the WandBLogger to work on all ranks by @ravi-mosaicml in #1304
Add documentation about run_name to Composer by @eracah in #1298
Enforce FusedLayerNorm is ordered last by @alextrott16 in #1309
Revert monitored barrier by @mvpatel2000 in #1311
[xs] Build the Composer Docker Image only on dev branch merges by @ravi-mosaicml in #1308
Fix Notebook Progress Bars by @ravi-mosaicml in #1313
Remove pytest-timeout by @ravi-mosaicml in #1317
[Minor] Inference API parameter name change by @dskhudia in #1315
Matthew/swa readme by @growlix in #1292
Enable gloo backend by @mvpatel2000 in #1321
[xs] Fix pytest test filtering; Bump the minimum pytorch version to 1.10 by @ravi-mosaicml in #1320
revert gloo by @mvpatel2000 in #1324
Fix WandB load from checkpoint by @abhi-mosaic in #1326
ALiBi for BERT and ALiBi testing by @alextrott16 in #1267
Update HF example with read of model eval accuracy by @lupesko in #1332
Cleanup API Reference Titles by @ravi-mosaicml in #1336
Fix a race condition in the object store logger by @ravi-mosaicml in #1328
Auto Grad Accum Change to Warning by @mvpatel2000 in #1338
Add export for inference callback by @nik-mosaic in #1323
Add save fine-tune model to HuggingFace example by @lupesko in #1333
Update DWD optimizers by @abhi-mosaic in #1339
Cap Numpy Version by @mvpatel2000 in #1345
Update slack link by @hanlint in #1344
Fix scheduler edge cases by @abhi-mosaic in #1350
Integration Tests for Object Stores and Loggers by @ravi-mosaicml in #1322
Retry SFTP on Size Mismatch by @ravi-mosaicml in #1300
[xs] Restore the dataloader and training properties in predict() by @ravi-mosaicml in #1352
Add Precision Contexts by @mvpatel2000 in #1347
Update GLU logging strings by @moinnadeem in #1348
Add domain-specific codeowners by @ravi-mosaicml in #1354
fix marker by @mvpatel2000 in #1359
Fix the profiler on multi-node training by @ravi-mosaicml in #1358
Glue Entrypoint by @ishanashastri in #1263
Yahp v0.1.3 by @mvpatel2000 in #1346
Move metrics to context by @mvpatel2000 in #1361
Refactor multiple losses to support dictionaries and fix discrepancies by @Landanjs in #1349
Fix Coverage Reports on Jenkins by @ravi-mosaicml in #1114
JSON Schemas by @mvpatel2000 in #1371
add filename extension by @mvpatel2000 in #1370
JSON Schemas pt 2 by @mvpatel2000 in #1373
Update Export for Inference methods by @nik-mosaic in #1355
Fix default precision by @A-Jacobson in #1369
Clean up unused exception by @mvpatel2000 in #1368
Revert "Clean up unused exception" by @ravi-mosaicml in #1378
Remove Unused Exception by @mvpatel2000 in #1379
Auto Grad Accum Cache Clearing by @mvpatel2000 in #1380
Add ability to register algorithm passes by @hanlint in #1377
Fix AMP resumption with grad scaler by @hanlint in #1376
Update CUDA and remove NCCL downgrade from Dockerfile by @abhi-mosaic in #1362
Add Notes on Artifact Logging by @ravi-mosaicml in #1381
Print the microbatch size when using Adaptive Gradient Accumulation by @hanlint in #1387
Cleaner API reference part 1: references with minimal import paths by @dblalock in #1385
Add Event.BEFORE_DATALOADER by @mvpatel2000 in #1388
remove private s3 paths by @A-Jacobson in #1389
Tutorial on training without Local Storage by @ravi-mosaicml in #1351
[inference] Update export_for_inference notebook with new APIs by @dskhudia in #1360
Fix resnet warnings criteria by @mvpatel2000 in #1395
Fix hparams error by @mvpatel2000 in #1394
Add knighton to codeowners for datasets by @knighton in #1397
Fix ImagenetDatasetHparams bug by @nik-mosaic in #1392
Decouple GLUE entry point saving and loading logic by @ishanashastri in #1390
Glue example notebook by @ishanashastri in #1383
Add informative error for infer batch size issues by @hanlint in #1401
Only sync batchnorm statistics within a node for deeplab by @Landanjs in #1391
Update DeepLabv3 pretrained weight interface to work with PyTorch 1.12 by @Landanjs in #1399
tpu single core by @florescl in #1400
Add support for Apple M chips by @hanlint in #1405
[xs] Add mps and tpu device to Trainer docstrings by @hanlint in #1410

Full Changelog: v0.8.2...v0.9.0

New Contributors

@vladd-i made their first contribution in #1196
@linden-li made their first contribution in #1203
@ejyuen made their first contribution in #1221
@lupesko made their first contribution in #1235
@isaac0804 made their first contribution in #1254
@xloem made their first contribution in #1259
@alextrott16 made their first contribution in #1199
@codestar12 made their first contribution in #1274
@rahulvigneswaran made their first contribution in #1285
@nik-mosaic made their first contribution in #1323

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.0