v0.9.0
🚀 Composer v0.9.0
Excited to share the release of Composer v0.9.0, which comes with an Inference Export API, beta support for Apple Silicon and TPU training, as well as expanded usability of NLP-related speed-up methods. This release includes 175 commits from 34 contributors, including 10 new contributors 🙌 !
pip install --upgrade mosaicml==0.9.0
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.9.0
New Features
-
📦 Export for inference APIs
Train with Composer and deploy anywhere! We have added a dedicated export API as well as an export training callback to allow you to export Composer-trained models for inference, supporting popular formats such as torchscript and ONNX.
For example, here’s how to export a model in torchscript format:
from composer.utils import export_for_inference # Invoking export with a trained model export_for_inference(model=model, save_format='torchscript', save_path=model_save_path)
Here’s an example of using the training callback, which automatically exports the model at the end of training to ONNX format:
from composer.callbacks import ExportForInferenceCallback # Initializing Trainer with the export callback callback = ExportForInferenceCallback(save_format='onnx', save_path=model_save_path) trainer = Trainer(model=model, callbacks=callback, train_dataloader=dataloader, max_duration='10ep') # Model will be exported at the end of training trainer.fit()
Please see our Exporting for Inference notebook for more information.
-
📈 ALiBi support for BERT training
You can now use ALiBi (Attention with Linear Biases; Press et al., 2021) when training BERT models with Composer, delivering faster training and higher accuracy by leveraging shorter sequence lengths.
ALiBi improves the quality of BERT pre-training, especially when pre-training uses shorter sequence lengths than the downstream (fine-tuning) task. This allows models with ALiBi to reach higher downstream accuracy with less pre-training time.
Example of using ALiBi as an algorithm with the Composer Trainer:
# Create an instance of a BERT masked language model model = composer.models.create_bert_mlm() # Apply ALiBi (when training is initialized) alibi = composer.algorithms.alibi(max_sequence_length=1024) # Train with ALiBi trainer = composer.trainer.Trainer( model=model, train_dataloader=train_dataloader, algorithms=[alibi] ) trainer.fit()
Example using the Composer Functional API:
import composer.functional as cf # Create an instance of a BERT masked language model model = composer.models.create_bert_mlm() # Apply ALiBi and expand the model's maximum sequence length to 1024 cf.apply_alibi(model=model, max_sequence_length=1024)
AliBi can also now be extended to work with custom models by registering your attention and embedding layers. Please see our ALiBi method card for more information.
-
🧐 Entry point for GLUE tasks pre-training and fine-tuning
You can now easily pre-train and fine-tune NLP models across all GLUE (General Language Understanding Evaluation) tasks through one simple entry point! The entry point handles model saving and loading, spawns GLUE tasks in parallel across all available GPUs, and delivers a highly efficient evaluation of model performance.
Example of launching the entrypoint:
# This runs pre-training followed by fine-tuning. # --training_scheme can take either pretrain, finetune, or all depending on the task! python run_glue_trainer.py -f glue_example.yaml --training_scheme all
Please see our GLUE entrypoint notebook for more information.
-
🤖 TPU support (in beta)
You can now use Composer to train your models on TPUs! Support is now available in Beta, and currently only supports single-core TPU training. Try it out, explore optimizations, and share your feedback and feature requests with us so we can make it better for you and for the community.
To use TPUs with Composer, simply specify a
tpu
device:# Set device to `tpu` trainer = composer.trainer.Trainer( model=model, train_dataloader=train_dataloader, max_duration=train_epochs, device='tpu') # Run fit trainer.fit()
Please see our Training with TPUs notebook for more information.
-
🍎 Apple Silicon support (beta)
Leverage Apple Silicon chips to train your models with Composer by providing the
device='mps'
argument:trainer = Trainer( ..., device='mps' )
We use the latest PyTorch MPS backend to execute the training. This requires torch version ≥1.12, and Max OSX 12.3+.
For more information on training with Apple M chips, see the PyTorch 1.12 blog and our API Reference for Composer specific details.
-
🚧 Contrib repository
Got a new method idea, or published a paper and want those methods to be easily accessible? We’ve created the
mcontrib
repository, with a lightweight process to contribute new algorithms. We’re happy to work directly with you to benchmark these methods and eventually “promote” them to Composer for use by end customers.Please checkout the README for details on how to contribute a new algorithm. For more details on how to write speed-up methods, see our notebook on custom speed-up methods.
Additional API Changes
-
🔢 Passes Module
The order in which algorithms are run matters significantly during composition. With this release we refactored algorithm passes into their own
passes
module. Users can now register custom passes (for custom algorithms) with the Engine. Please see #1377 for more information. -
🗄️ Default Checkpoint Extension
The CheckpointSaver now defaults to using the
*.pt
extension for checkpoint fienames. Please see #1370 for more information. -
👁️ Models Refactor
Most vision models (ResNet, MNIST, ViT, EfficientNet) have been refactored from classes to a factory function. For example
ComposerResNet
->composer_resnet
.# before from composer.models import ComposerResNet model = ComposerResNet(..) from composer.models import composer_resnet # after model = composer_resnet(..)
The same refactor has been done for NLP as well, e.g.
BERTModel
->create_bert_mlm
andcreate_bert_classification
. -
➕ Misc API Changes
BreakEpochException
has been removed.state.is_model_deepspeed
has been moved tocomposer.utils.is_model_deepspeed
.- Helper function
monitored_barrier
has been added tocomposer
distributed.
Bug Fixes
- Add informative error for infer batch size issues (#1401)
- Fix ImagenetDatasetHparams bug (#1392), resolves #1111
- Fix hparams error condition checking (#1394)
- Fix AMP resumption with grad scaler (#1376)
- Auto Grad Accum Cache Clearing (#1380), fixes issue reported in #1331
- Fix default precision (#1369)
- Fix the profiler on multi-node training (#1358), resolves #1270
- Retry SFTP on Size Mismatch (#1300)
- Fix scheduler edge cases (#1350), resolves #1077
- Fix a race condition in the object store logger (#1328)
- Fix WandB load from checkpoint (#1326)
- Fix Notebook Progress Bars (#1313)
Commits
What's Changed
- Fix DeepSpeed typo in docstring by @abhi-mosaic in #1188
- Move grad_accum logging to every step by @coryMosaicML in #1187
- Update STYLE_GUIDE with details on Documentation by @bandish-shah in #1183
- ProgressBar Units by @hanlint in #1190
- Added Xavier Normal initializer by @vladd-i in #1196
- Updated cost figure by @nqn in #1180
- Remove algorithm yamls by @hanlint in #1193
- Fix the Composer Launch Script for the Composer Dockerimage; Default
nproc = torch.cuda.device_count()
if not specified via env by @ravi-mosaicml in #1195 - Bert model card by @A-Jacobson in #1198
- Add Notes on Early Stopping by @anisehsani in #1182
- Stochastic depth that preserves weights by @Landanjs in #1085
- Adding Gated Linear Units as an algorithm by @moinnadeem in #1192
- A utility to fuse parallel linear layers in FX-traced models by @dskhudia in #1189
- Build+push Composer dockerimages to
mosaicml/composer_staging
by @ravi-mosaicml in #1197 - Fix the SFTP Object Store by @ravi-mosaicml in #1202
- Bert emoji by @A-Jacobson in #1205
- Adding a constant warmup scheduler by @linden-li in #1203
- Fix multi-GPU conflicts when downloading
torchvision
datasets by @abhi-mosaic in #1201 - Add caveats about automatic gradient accumulation by @hanlint in #1207
- Remove the
composer_train
entrypoint; put it back inexamples
by @ravi-mosaicml in #1211 - Fix Composer staging dockerimages by @ravi-mosaicml in #1210
- Set SFTP Object Store Private Key Filepath from an Environ by @ravi-mosaicml in #1212
- [xs] Fix progress bars in
get_file
by @ravi-mosaicml in #1216 - Cleanup SFTP url parsing for StreamingDataset by @abhi-mosaic in #1217
- Fix Symlinks on Non-Libcloud Object Stores by @ravi-mosaicml in #1209
- Fix the ObjectStoreLogger with Overwrite=True by @ravi-mosaicml in #1208
- Throughput metrics by @linden-li in #1215
- Fix module surgery for training resumptions with optimizers that save state by @dskhudia in #1200
- Update bert-base.yaml by @moinnadeem in #1219
- StreamingDataset: make remote optional, attempt to prettify docstrings. by @knighton in #1220
- Update vision-style
StreamingDataset
s to subclassVisionDataset
by @ravi-mosaicml in #1223 - Improve docstrings. by @knighton in #1222
- shardwise zip streaming datasets by @milocress in #1177
- updated mosaic logos to composer logos in docs by @ejyuen in #1221
- Add
COMPOSER_KNOWN_HOSTS_FILENAME
for setting the sftp known hosts file environ by @ravi-mosaicml in #1224 - StreamingDataset: correctly handle exceptions in child download thread. by @knighton in #1228
- hot fix compression 404 by @milocress in #1229
- Treat any dropped SSH/SFTP connection as a transient error by @ravi-mosaicml in #1225
- refactor bert and gpt by @A-Jacobson in #1130
- Hotfix for S3
FileNotFoundError
by @abhi-mosaic in #1233 - Fix StreamingDataset compression with multi-rank by @milocress in #1231
- Refactor vision models by @Landanjs in #1227
- Update resnet50_medium.yaml by @lupesko in #1235
- Increase default timeout for
StreamingC4
to 120s by @abhi-mosaic in #1234 - Add Debug Log Statements; Fix Pyright by @hanlint in #1218
- Hotfix deeplabv3 by @Landanjs in #1238
- Add Tensorboard Logger by @eracah in #1194
- Move the model and optimizers to the device before
Event.INIT
by @ravi-mosaicml in #1084 - Fix bug in streaming iteration/downloading, refactor by @knighton in #1239
- Support sequence of losses in backwards pass by @Landanjs in #1240
- Add device_id param to DeviceGPU by @ishanashastri in #1244
- Update CutMix to work with segmentation style labels by @coryMosaicML in #1230
- Catching ChannelErrors on SFTP Failures by @moinnadeem in #1245
- Make
StreamingDataset
compression file easier to write/read by @abhi-mosaic in #1246 - [XS] Updating console progress_bar logger to use max_duration units by @moinnadeem in #1243
- Catch botocore ClientError 403 by @abhi-mosaic in #1249
- Tensorboard Notebook + Tutorial by @eracah in #1250
- Fix repeated words in event.py by @isaac0804 in #1254
- Make progressive resizing quieter by @coryMosaicML in #1255
- fix typo in example by @xloem in #1259
- Create a new
boto3.Session()
perS3ObjectStore
instance by @ravi-mosaicml in #1260 - Fix recipe yamls for
v0.8
, add testing by @hanlint in #1257 - Automatic Stochastic depth on residual blocks by @dskhudia in #1253
- Sequence length warmup update and tests by @alextrott16 in #1199
- ProgressBarLogger UX Enhancements by @ravi-mosaicml in #1264
- Update to latest pytorch by @mvpatel2000 in #1262
- Add packaging to
meta.yaml
; addpy-cpuinfo
max version by @ravi-mosaicml in #1271 - Fix Flaky Tests by @ravi-mosaicml in #1272
- Add callback for visualizing image inputs and outputs by @coryMosaicML in #1266
- Add
scale_warmup
argument to schedulers by @hanlint in #1268 - Switch Jenkins to r1z3 by @ravi-mosaicml in #1277
- BERT and C4 updates by @abhi-mosaic in #1252
- Default to
allow_tf32=True
for GPU Devices by @ravi-mosaicml in #1275 - Fix grad accum parsing in hparams by @hanlint in #1256
- Fix issue with doctest format in some docstring examples by @Landanjs in #1269
- Adds S3ObjectStore import to util init.py by @codestar12 in #1274
- Add tutorial on exporting for inference by @hanlint in #1276
- HTTPS downloads for streaming datasets by @ravi-mosaicml in #1258
- object stores for streaming datasets by @milocress in #1248
- Allow object name prefix for S3ObjectStore by @abhi-mosaic in #1278
- Hotfix CO-658 by @milocress in #1273
- Fix S3 remote paths for StreamingDataset download by @abhi-mosaic in #1280
- Add combo loss to DeepLabv3+ by @Landanjs in #1265
- Checkpoint backwards compatibility for ProgressBar by @hanlint in #1287
- Add missing callbacks by @hanlint in #1286
- Fix S3 prefix upload/download by @abhi-mosaic in #1288
- Fix device inference in module surgery by @hanlint in #1290
- Actual fix to backwards compatibility by @hanlint in #1289
- Bugs in getting_started.ipynb by @rahulvigneswaran in #1285
- Add pytorch 1.12.0 docker image by @linden-li in #1247
- Fix TB Logger + ObjectStore quadratic complexity issue by doing 1 file per flush by @eracah in #1283
- Enable README Doctests with GPUs by @mvpatel2000 in #1279
- Fix logging of hparams to object stores by @ravi-mosaicml in #1297
- [xs] Reformat the Composer Version String by @ravi-mosaicml in #1301
- Add monitored barrier for autograd accum by @mvpatel2000 in #1295
- [xs] Notebook Fixes by @ravi-mosaicml in #1299
- [xs] Store the Composer version in one place. by @ravi-mosaicml in #1302
- model export for inference. Functional API by @dskhudia in #1294
- Add a
return_outputs
flag topredict()
by @ravi-mosaicml in #1307 - Integration Testing by @ravi-mosaicml in #1305
- Fix
get_file_artifact
in the WandBLogger to work on all ranks by @ravi-mosaicml in #1304 - Add documentation about
run_name
to Composer by @eracah in #1298 - Enforce FusedLayerNorm is ordered last by @alextrott16 in #1309
- Revert monitored barrier by @mvpatel2000 in #1311
- [xs] Build the Composer Docker Image only on
dev
branch merges by @ravi-mosaicml in #1308 - Fix Notebook Progress Bars by @ravi-mosaicml in #1313
- Remove
pytest-timeout
by @ravi-mosaicml in #1317 - [Minor] Inference API parameter name change by @dskhudia in #1315
- Matthew/swa readme by @growlix in #1292
- Enable gloo backend by @mvpatel2000 in #1321
- [xs] Fix pytest test filtering; Bump the minimum pytorch version to 1.10 by @ravi-mosaicml in #1320
- revert gloo by @mvpatel2000 in #1324
- Fix WandB load from checkpoint by @abhi-mosaic in #1326
- ALiBi for BERT and ALiBi testing by @alextrott16 in #1267
- Update HF example with read of model eval accuracy by @lupesko in #1332
- Cleanup API Reference Titles by @ravi-mosaicml in #1336
- Fix a race condition in the object store logger by @ravi-mosaicml in #1328
- Auto Grad Accum Change to Warning by @mvpatel2000 in #1338
- Add export for inference callback by @nik-mosaic in #1323
- Add save fine-tune model to HuggingFace example by @lupesko in #1333
- Update DWD optimizers by @abhi-mosaic in #1339
- Cap Numpy Version by @mvpatel2000 in #1345
- Update slack link by @hanlint in #1344
- Fix scheduler edge cases by @abhi-mosaic in #1350
- Integration Tests for Object Stores and Loggers by @ravi-mosaicml in #1322
- Retry SFTP on Size Mismatch by @ravi-mosaicml in #1300
- [xs] Restore the dataloader and training properties in
predict()
by @ravi-mosaicml in #1352 - Add Precision Contexts by @mvpatel2000 in #1347
- Update GLU logging strings by @moinnadeem in #1348
- Add domain-specific codeowners by @ravi-mosaicml in #1354
- fix marker by @mvpatel2000 in #1359
- Fix the profiler on multi-node training by @ravi-mosaicml in #1358
- Glue Entrypoint by @ishanashastri in #1263
- Yahp v0.1.3 by @mvpatel2000 in #1346
- Move metrics to context by @mvpatel2000 in #1361
- Refactor multiple losses to support dictionaries and fix discrepancies by @Landanjs in #1349
- Fix Coverage Reports on Jenkins by @ravi-mosaicml in #1114
- JSON Schemas by @mvpatel2000 in #1371
- add filename extension by @mvpatel2000 in #1370
- JSON Schemas pt 2 by @mvpatel2000 in #1373
- Update Export for Inference methods by @nik-mosaic in #1355
- Fix default precision by @A-Jacobson in #1369
- Clean up unused exception by @mvpatel2000 in #1368
- Revert "Clean up unused exception" by @ravi-mosaicml in #1378
- Remove Unused Exception by @mvpatel2000 in #1379
- Auto Grad Accum Cache Clearing by @mvpatel2000 in #1380
- Add ability to register algorithm passes by @hanlint in #1377
- Fix AMP resumption with grad scaler by @hanlint in #1376
- Update CUDA and remove NCCL downgrade from Dockerfile by @abhi-mosaic in #1362
- Add Notes on Artifact Logging by @ravi-mosaicml in #1381
- Print the microbatch size when using Adaptive Gradient Accumulation by @hanlint in #1387
- Cleaner API reference part 1: references with minimal import paths by @dblalock in #1385
- Add Event.BEFORE_DATALOADER by @mvpatel2000 in #1388
- remove private s3 paths by @A-Jacobson in #1389
- Tutorial on training without Local Storage by @ravi-mosaicml in #1351
- [inference] Update export_for_inference notebook with new APIs by @dskhudia in #1360
- Fix resnet warnings criteria by @mvpatel2000 in #1395
- Fix hparams error by @mvpatel2000 in #1394
- Add knighton to codeowners for datasets by @knighton in #1397
- Fix ImagenetDatasetHparams bug by @nik-mosaic in #1392
- Decouple GLUE entry point saving and loading logic by @ishanashastri in #1390
- Glue example notebook by @ishanashastri in #1383
- Add informative error for infer batch size issues by @hanlint in #1401
- Only sync batchnorm statistics within a node for deeplab by @Landanjs in #1391
- Update DeepLabv3 pretrained weight interface to work with PyTorch 1.12 by @Landanjs in #1399
- tpu single core by @florescl in #1400
- Add support for Apple M chips by @hanlint in #1405
- [xs] Add
mps
andtpu
device to Trainer docstrings by @hanlint in #1410
Full Changelog: v0.8.2...v0.9.0
New Contributors
- @vladd-i made their first contribution in #1196
- @linden-li made their first contribution in #1203
- @ejyuen made their first contribution in #1221
- @lupesko made their first contribution in #1235
- @isaac0804 made their first contribution in #1254
- @xloem made their first contribution in #1259
- @alextrott16 made their first contribution in #1199
- @codestar12 made their first contribution in #1274
- @rahulvigneswaran made their first contribution in #1285
- @nik-mosaic made their first contribution in #1323