Skip to content

Releases: mosaicml/composer

v0.8.2

27 Jul 23:36
Compare
Choose a tag to compare

🚀 Composer v0.8.2

Composer v0.8.2 is released! Install via pip:

pip install --upgrade mosaicml==0.8.2

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.2

🐛 Bug Fixes

  1. Fixed Notebook Progress Bars in Colab

    Fixes a bug introduced by #1264 which causes Composer running in Colab notebooks to error out with:
    UnsupportedOperation: fileno.

    Closes #1312. Fixed in PR #1314.

Changelog

v0.8.1...v0.8.2

v0.8.1

22 Jul 23:23
8418a67
Compare
Choose a tag to compare

🚀 Composer v0.8.1

Composer v0.8.1 is released! Install via pip:

pip install --upgrade mosaicml==0.8.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.1

🎁 New Features

  1. 🖼️ Image Visualizer

    The ImageVisualizer callback periodically logs the training and validation images when using the WandB logger. This is great for validating your dataloader pipeline, especially if extensive data augmentations are used. Also, when training on a semantic segmentation task, the callback can log the target segmentation mask and the predicted segmentation mask by setting the argument mode='segmentation'. See PR #1266 for more details. Here is an example of using the ImageVisualizer callback:

    from composer import Trainer
    from composer.callbacks import ImageVisualizer
    
    # Callback to log 8 training images after every 100 batches
    image_visualizer = ImageVisualizer()
    
    # Construct trainer
    trainer = Trainer(
        ...,
        callbacks=image_visualizer
    )
    
    # Train!
    trainer.fit()

    Here is an example visualization from the training set of ADE20k:

  2. 📶 TensorBoard Logging

    You can now log metrics and losses from your Composer training runs with Tensorboard! See #1250 and #1283 for more details. All you have to do is create a TensorboardLogger object and add it
    to the list of loggers in your Trainer object like so:

    from composer import Trainer
    from composer.loggers import TensorboardLogger
    
    tb_logger = TensorboardLogger(log_dir="./my_tensorboard_logs")
    
    trainer = Trainer(
        ...
        # Add your Tensorboard Logger to the trainer here.
        loggers=[tb_logger],
    )
    
    trainer.fit()

    For more information, see this tutorial.

  3. 🔙 Multiple Losses

    Adds support for multiple losses. If a model returns a tuple of losses, they are summed before the loss.backward() call. See #1240 for more details.

  4. 🌎️ Stream Datasets from HTTP URIs

    You can now specify a HTTP URI for a Streaming Dataset remote. See #1258 for more detials. For example:

    from composer.datasets.streaming import StreamingDataset
    from torch.utils.data import DataLoader
    
    # Construct the Dataset
    dataset = StreamingDataset(
        ...,
        remote="https://example.com/dataset/",
    )
    
    # Construct the DataLoader
    train_dl = DataLoader(dataset)
    
    # Construct the Trainer
    trainer = Trainer(
        ...,
        train_dataloader=train_dl,
    )
    
    # Train!
    trainer.fit()

    For more information on streaming datasets, see this tutorial.

  5. 🏄️ GPU Devices default to TF32 Matmuls

    Beginning with PyTorch 1.12, the default behavior for computing FP32 matrix multiplies on NVIDIA Ampere devices was switched from TF32 to FP32. See PyTorch documentation here.

    Since Composer is designed specifically for ML training with a focus on efficiency, we choose to preserve the old default of using TF32 on Ampere devices. This leads to significantly higher throughput when training in single precision, without impact training convergence. See PR #1275 for implementation details.

  6. 👋 Set the Device ID for GPU Devices

    Specify the device ID within a DeviceGPU to train on when instantiating a Trainer object instead of using the local ID! For example,

    from composer.trainer.devices.device_gpu import DeviceGPU
    
    # Specify to use GPU 3 to train 
    device = DeviceGPU(device_id=3)
    
    # Construct the Trainer
    trainer = Trainer(
        ...,
        device = device
    )
    
    # Train!
    trainer.fit()
  7. BERT and C4 Updates

    We make some minor adjustments to our bert-base-uncased.yaml training config. In particular, we make the global train and eval batch sizes a power of 2. This maintains divisibility when using many GPUs in multi-node training. We also adjust the max_duration so that it converts cleanly to 70,000 batches.

    We also upgrade our StreamingDataset C4 conversion script (scripts/mds/c4.py) to use a multi-threaded reader. On a 64-core machine we are able to convert the 770GB train split to .mds format in ~1.5hr.

  8. 📂 Set a prefix when using a S3ObjectStore

    When using S3ObjectStore for applications like checkpointing, it can be useful to provide path prefixes, mimicking folder/subfolder directories like on a local filesystem. When prefix is provided, any objects uploaded with S3ObjectStore will be stored at f's3://{self.bucket}/{self.prefix}{object_name}'.

  9. ⚖️ Scale the Warmup Period of Composer Schedulers

    Added a new flag scale_warmup to schedulers that will scale the warmup period when a scale schedule ratio is applied. Default is False to mirror default behavior. See #1268 for more detials.

  10. 🧊 Stochastic Depth on Residual Blocks

    Residual blocks are detected automatically and replaced with stochastic versions. See #1253 for more details.

🐛 Bug Fixes

  1. Fixed Progress Bars

    Fixed a bug where the the Progress Bars jumped around and did not stream properly when tailing the terminal over the network. Fixed in #1264, #1287, and #1289.

  2. Fixed S3ObjectStore in Multithreaded Environments

    Fixed a bug where the boto3 crashed when creating the default session in multiple threads simultaniously (see boto/boto3#1592). Fixed in #1260.

  3. Retry on ChannelException errors in the SFTPObjectStore

    Catch ChannelException SFTP transient error and retry. Fixed in #1245.

  4. Treating S3 Permission Denied Errors as Not Found Errors

    We update our handling of botocore 403 ClientErrors to interpret them as FileNotFoundErrors. We do this because of a situation that occurs when a user has no S3 credentials configured, and tries to read from a bucket with public files. For privacy, Amazon S3 raises 403 (Permission Denied) instead of 404 (Not Found) errors. As such, PR #1249 treats 403 ClientErrors as FileNotFoundErrors.

  5. Fixed Parsing of grad_accum in the TrainerHparams

    Fixes an error where the command line override --grad_accum lead to incorrect parsing. Fixed in #1256.

  6. Fixed Example YAML Files

    Our recipe configurations (YAML) are updated to the latest version, and a test was added to enforce correctness moving forward. Fixed in #1235 and #1257.

Changelog

v0.8.0...v0.8.1

v0.8.0

01 Jul 04:15
Compare
Choose a tag to compare

🚀 Composer v0.8.0

Composer v0.8.0 is released! Install via pip:

pip install --upgrade mosaicml==0.8.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.0

New Features

  1. 🤗 HuggingFace ComposerModel

    Train your HuggingFace models with Composer! We introduced a HuggingFaceModel that converts your existing 🤗 Transformers models into a ComposerModel.

    For example:

    import transformers
    from composer.models import HuggingFaceModel
    
    # Define the model
    hf_model = transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
    
    # Convert it into a ComposerModel
    model = HuggingFaceModel(hf_model)
    
    # Construct the trainer
    trainer = Trainer(
        ...,
        model,
    )
    
    # Train!
    trainer.fit()

    For more information, see the example on fine-tuning a pretrained BERT with Composer.

  2. 🫕 Fused Layer Norm

    Fused LayerNorm replaces implementations of torch.nn.LayerNorm with a apex.normalization.fused_layer_norm. The fused kernel provides increased GPU utilization.

    For example:

    from composer.trainer import Trainer
    from composer.algorithms import FusedLayerNorm
    
    # Initialize the algorithm
    alg = FusedLayerNorm()
    
    # Construct the trainer
    trainer = Trainer(
        algorithms=alg,
    )
    
    # Train!
    trainer.fit()

    See the method card for more information.

  3. 💾 Ignore Checkpoint Parameters

    If you have a checkpoint and don't want to restore some elements of the chceckpoint to the state, we added a load_ignore_keys parameter. Any specified (nested) keys will be ignored. Glob syntax is supported!

    For example, to restore a checkpoint without the seed:

    from composer import Trainer
    
    trainer = Trainer(
        ...,
        load_path="path/to/my/checkpoint.pt",
        load_ignore_keys=["state/rank_zero_seed", "rng"],
    )

    See the Trainer API Reference for more information.

  4. 🪣 Object Stores

    Composer v0.8.0 introduces an abstract Object Store API to support multiple object store drivers, such as boto3 (for Amazon S3) and Paramiko (for SFTP), in addition to the existing libcloud implementation.

    For example, if you are training on AWS where credentials are available in the environment, here's how to to save checkpoints to a S3 object store via Boto3.

    from composer import Trainer
    from composer.loggers import ObjectStoreLogger
    from composer.utils.object_store import S3ObjectStore
    
    logger = ObjectStoreLogger(
        object_store_cls=S3ObjectStore,
        object_store_kwargs={
            # These arguments will be passed into the S3ObjectStore -- e.g.:
            # object_store = S3ObjectStore(**object_store_kwargs)
            # Refer to the S3ObjectStore class for documentation
            'bucket': 'my-bucket',
        },
    )
    
    trainer = Trainer(
        ...,
        loggers=logger,
    )
    
    # Train!
    trainer.fit()

    See the Object Store API Reference for more information.

  5. 🪨 Artifact Metadata

    Composer automatically logs the epoch, batch, sample, and token counts as metadata when storing artifacts in Weights & Biases. See the API Reference for more information.

API Changes

  1. ✂️ Gradient Clipping is now an Algorithm

    To clean up the Trainer, we moved gradient clipping into an Algorithm. The grad_clip_norm argument in the Trainer is deprecated and will be removed in a future version of Composer. Instead, use the Gradient Clipping algorithm:

    For example:

    from composer.algorithms import GradientClipping
    from composer.trainer import Trainer
    
    # Configure gradient clipping
    gradient_clipping = GradientClipping()
    
    # Configure the trainer
    trainer = Trainer(
        ...,
        algorithms=gradient_clipping,
    )
    
    # Train!
    trainer.fit()

    See the method card for more information.

  2. 🕒️ Removed batch_num_samples and batch_num_tokens from the state.

    State properties batch_num_samples and batch_num_tokens have been removed.
    Instead, use State.timestamp for token and sample tracking.

  3. 🧑‍🤝‍🧑 DDP Sync Strategy

    We changed the default DDP Sync Strategy to MULTI_AUTO_SYNC, as FORCED_SYNC doesn't work with all algorithms.

  4. 🏃 Moved the run_name into the State

    The run_name has been added to the State object, so it is persisted with checkpoints. It has been removed from the Logger.

Bug Fixes

  • In the Object Store Logger, added in retries for credential validation, and validating credentials only on global rank zero. (#1144)
  • Fixed a bug in the speed monitor where it returned negative wall clock times. (#1123)
  • Fixed how block-wise Stochastic Depth could freeze the trainer. (#1087)
  • Fixed a bug in the [MLPerfCallback] where sample counts were incorrect on per-sharded datasets. (#1156)

Changelog

v0.7.1...v0.8.0

v0.7.1

07 Jun 00:21
Compare
Choose a tag to compare

🚀 Composer v0.7.1

Composer v0.7.1 is released! Install via pip:

pip install --upgrade mosaicml==0.7.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.7.1

Bug Fixes

  • Upgraded wandb>=0.12.17, to fix incompatibility with protobuf >= 4 (wandb/wandb#3709)

Changelog

v0.7.0...v0.7.1

v0.7.0

24 May 00:56
Compare
Choose a tag to compare

🚀 Composer v0.7.0

Composer v0.7.0 is released! Install via pip:

pip install --upgrade mosaicml==0.7.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.7.0

New Features

  1. 🏎️ FFCV Integration

    Composer supports FFCV, a fast dataloader for image datasets. We've found FFCV can speed up ResNet-56 training by 16%, in addition to existing speed-ups already supported by Composer! It's easy to use FFCV with any existing image dataset:

    import ffcv
    from ffcv.fields.decoders import IntDecoder, SimpleRGBImageDecoder
    from torchvision.datasets import ImageFolder
    
    from composer import Trainer
    from composer.datasets.ffcv_utils import write_ffcv_dataset, ffcv_monkey_patches
    
    # Convert the dataset to FFCV format
    # This step needs to be done only once per dataset
    dataset = ImageFolder(...)
    ffcv_dataset_path = "my_ffcv_dataset.ffcv"
    write_ffcv_dataset(dataset=dataset, write_path=ffcv_dataset_path)
    
    # In FFCV v0.0.3, len(dataloader) is expensive. Fix that via a monkeypatch
    ffcv_monkey_patches()
    
    # Construct the train dataloader
    train_dl = ffcv.Loader(
        ffcv_dataset_path,
        ...
    )
    
    # Construct the trainer
    trainer = Trainer(
        train_dataloader=train_dl,
    )
    
    # Train using FFCV!
    trainer.fit()

    See our notebook on training with FFCV for a full example.

  2. ✅ Autoresume from Checkpoints

    When setting autoresume=True, Composer can automatically resume from an existing checkpoint before starting a new training run. Specifically, the trainer will look in the save_folder (and any loggers that save artifacts) for the latest checkpoint; if none is found, then it'll start from the beginning.

    This feature does not require a different entrypoint to distinguish between starting a new training run or automatically resuming from an existing one, making it easy to use Composer on spot preemptable cloud instances. Simply set autoresume=True, point the instance to your training script, and Composer will handle the rest!

    from composer import Trainer
    
    # When using `autoresume`, it is required to specify the
    # `run_name`, so Composer will know which training run to
    # resume
    run_name = "my_autoresume_training_run"
    
    trainer = Trainer(
        ...,
        run_name=run_name,
        # specify where to save checkpoints
        save_folder="./my_autoresume_training_run",
        autoresume=True,
    )
    
    # Train! Composer will handle loading an existing
    # checkpoint or starting a new training run
    trainer.fit()

    See the Trainer API Reference for more information.

  3. ♻️ Reuse the Trainer

    Want to train on multiple dataloaders sequentially? Each trainer object now supports multiple calls to Trainer.fit(), so you can continue training an existing model on a new dataloader, with new schedulers, all while using the same model and trainer object.

    For example:

    from torch.utils.data import DataLoader
    
    from composer import Trainer
    
    train_dl_1 = DataLoader(...)
    trainer = Trainer(
        model=model,
        max_duration='5ep',
        train_dataloader=train_dl_1,
    )
    
    # Train once!
    trainer.fit()
    
    # Train again with a new dataloader for another 5 epochs
    train_dl_2 = DataLoader(...)
    trainer.fit(
        train_dataloader=train_dl_2,
        duration='5ep',
    )

    See the Trainer API Reference for more information.

  4. ⚖️ Eval or Predict Only? No Problem

    You can evaluate or predict on an existing model, without having to supply a train dataloader or training duration argument -- they're now optional.

    import torchmetrics
    from torch.utils.data import DataLoader
    
    from composer import Trainer
    
    # Construct the trainer
    trainer = Trainer(model=model)
    
    # Evaluate!
    eval_dl = DataLoader(...)
    trainer.eval(
        dataloader=eval_dl,
        metrics=torchmetrics.Accuracy(),
    )
    
    # Examine evaluation metrics
    print("Eval metrics", trainer.state.metrics['eval'])
    
    # Or, predict!
    predict_dl = DataLoader(...)
    trainer.predict(dataloader=predict_dl)

    See the Trainer API Reference for more information.

  5. 🛑 Early Stopper and Threshold Stopper Callbacks

    The Early Stopper and Threshold Stopper callbacks end training early when the target metrics are met:

    from composer.callbacks.early_stopper import EarlyStopper
    from torchmetrics.classification.accuracy import Accuracy
    
    # Construct the callback
    early_stopper = EarlyStopper(
        monitor="Accuracy",
        dataloader_label="eval",
        patience=2,
    )
    
    # Construct the trainer
    trainer = Trainer(
        ...,
        callbacks=early_stopper,
        max_duration="100ep",
    )
    
    # Train!
    # Training will end early if the accuracy does not improve
    # over two epochs
    trainer.fit()
  6. 🪵 Load Checkpoints from Loggers

    It's now possible to restore checkpoints from loggers that support file artifacts (such as the Weights & Baises Logger). No need to download your checkpoints manually anymore.

    from composer import Trainer
    from composer.loggers import WandBLogger
    
    # Configure the W&B Logger
    wandb_logger = WandBLogger(
        # set to True to capture artifacts, like checkpoints
        log_artifacts=True,
        init_params={
            'project': 'my-wandb-project-name',
        },
    )
    
    # Then, to train and save checkpoints to W&B:
    trainer = Trainer(
        ...,
        loggers=wandb_logger,
        save_folder="/tmp/checkpoints",
        save_interval="1ep",
        save_artifact_name="epoch{epoch}.pt",
    )
    
    # Finally, to load checkpoints from W&B
    trainer = Trainer(
        ...,
        load_object_store=wandb_logger,
        load_path="epoch1.pt:latest",
    )
  7. ⌛ Wall Clock, Evaluation, and Prediction Time Tracking

    The timestamp object measures wall clock time via three new fields: total_wct, epoch_wct, and batch_wct. These fields track the total elapsed training time, the elapsed training time of the current epoch, and the time to train the last batch. Read the wall clock time via a callback:

    from composer import Callback, Trainer
    
    class MyCallback(Callback):
        def batch_end(self, state, event):
            print(f"Total wct: {state.timetsamp.total_wct}")
            print(f"Epoch wct: {state.timetsamp.epoch_wct}")
            print(f"Batch wct: {state.timetsamp.batch_wct}")
    
    # Construct the trainer with this callback
    trainer = Trainer(
        ...,
        callbacks=MyCallback(),
    )
    
    # Train!
    trainer.fit()

    In addition, the training state object has two new fields for tracking time during evaluation and prediction: eval_timestamp and predict_timestamp. These fields, just like any others on the state object, are accessible to algorithms, callbacks, and loggers.

  8. Training DeepLabv3+ on the ADE20k Dataset

    DeepLabv3+ is a common baseline model for semantic segmentation tasks. We provide a ComposerModel implementation for DeepLabv3+ built using torchvision and mmsegmentation for the backbone and head, respectively.

    We found the DeepLabv3+ baseline can be significantly improved using the new PyTorch pre-trained weights. Additional gains are made through a hyperparameter sweep.

    We benchmark our DeepLabv3+ model on a single 8xA100 machine using ADE20k, a popular semantic segmentation dataset. The final results on ADE20k are:

    Model mIoU Time-to-Train
    Unoptimized DeepLabv3+ 44.17 +/-...
Read more

v0.6.1

06 May 02:25
Compare
Choose a tag to compare

🚀 Composer v0.6.1

Composer v0.6.1 is released!

Go ahead and upgrade; it's fully backwards compatible with Composer v0.6.0.

Install via pip:

pip install --upgrade mosaicml==0.6.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.6.1

What's New?

  1. 📎 Adaptive Gradient Clipping (AGC)

    Adaptive Gradient Clipping (AGC) clips gradients based on the ratio of their norms with weights' norms. This technique helps stabilize training with large batch sizes, especially for models without batchnorm layers.

  2. 🚚 Exponential Moving Average (EMA)

    Exponential Moving Average (EMA) is a model averaging technique that maintains an exponentially weighted moving average of the model parameters during training. The averaged parameters are used for model evaluation. EMA typically results in less noisy validation metrics over the course of training, and sometimes increased generalization.

  3. 🪵 Logger is available in the ComposerModel

    The Logger is bound to the ComposerModel via the self.logger attribute. It is available during training on all methods (other than __init__).

    For example, to log hidden activation:

    class Net(ComposerModel):
    
        def forward(self, x):
            x = F.relu(F.max_pool2d(self.conv1(x), 2))
            x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
            if self.logger:
                self.logger.data_batch({
                    "hidden_activation_norm": x.norm(2).item(),
                })
            x = x.view(-1, 320)
            x = F.relu(self.fc1(x))
            x = F.dropout(x, training=self.training)
            x = self.fc2(x)
            return F.log_softmax(x)
  4. 🐛 Environment Collection Script

    Composer v0.6.1 includes an environment collection script which generates a printout of your system configuration and python environment. If you run into a bug, the results from this script will help us debug the issue and fix Composer.

    To collect your environment information:

    $ pip install mosaicml  # if composer is not already installed
    $ composer_collect_env

    Then, include the output in your GitHub Issue.

What's Improved?

  1. 📜 TorchScriptable Algorithms

    BlurPool, Ghost BatchNorm, and Stochastic Depth are now TorchScript-compatible. Try exporting your models with these algorithms enabled!

  2. 🏛️ ColOut on Segmentation

    ColOut now supports segmentation-style models.

What's Fixed?

  1. 🚑️ Loggers capture the Traceback

    We fixed a bug so the Loggers, such as the Weights & Biases Logger and the File Logger, will capture the traceback any exception that crashes the training process.

  2. 🏋️ Weights & Biases Logger Config

    We fixed a bug where the the Weights & Biases Logger was not properly recording the configuration.

Full Changelog

v0.6.0...v0.6.1

v0.6.0

21 Apr 01:49
Compare
Choose a tag to compare

🚀 Composer v0.6.0

Composer v0.6.0 is released! Install via pip:

pip install --upgrade mosaicml==0.6.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.6.0

Major Changes

  1. 🗃️ Automatic Gradient Accumulation

    Composer v0.6.0 can automatically pick an appropriate value for gradient accumulation. The trainer will automatically catch
    OutOfMemory exceptions and handle them gracefully. No need to manually tune this parameter for each model, batch size, and
    hardware combination!

    To use automatic gradient accumulation, set grad_accum='auto'. For example:

    trainer = Trainer(
        ...,
        grad_accum='auto',
    )
  2. 💾 Artifact Logging

    Training on spot instances? Composer v0.6.0 introduces artifact logging, making it possible to store checkpoints and other artifacts directly to cloud storage. See the Object Store Logger and the Checkpointing Guide for more information.

    Artifact Logging has replaced the run directory and the run directory uploader, which have been removed.

  3. 📊 Metric Values on the State

    Composer v0.6.0 binds the computed metric values on the State. Go ahead and read these values from your own callbacks! We'll be releasing an early stopping callback in an upcoming Composer release.

  4. ⚠️ NoEffectWarning and NotIntendedUseWarning for Algorithms

    Some algorithms, such as BlurPool, now emit a NoEffectWarning or a NotIntendedUseWarning when they're not being used appropriately.

Minor Improvements

  1. 🏃‍♀️ Training Run Names

    We introduced a run_name parameter in the Trainer to help organize training runs.

    trainer = Trainer(
        ...,
        run_name='awesome-traing-run',
    )

    We'll automatically pick one if the run name is not specified.

  2. 💈 Automatic Progress Bars

    The ProgressBarLogger, formally called the TQDMLogger, is automatically enabled for all training runs.

    To disable the progress bar, set progress_bar=False. For example:

    trainer = Trainer(
        ...,
        progress_bar=False,
    )
  3. 🪵 Logged Data in the Console

    To print Logger calls to the console, set the log_to_console and the console_log_level arguments.

    trainer = Trainer(
        ...,
        log_to_console=True,
        console_log_level="epoch",
    )

    By default, the console logger will only be enabled when progress_bar=False. The default console log level is epoch.

  4. 📃 Capturing stdout and stderr in Log Files

    The FileLogger captures stdout and stderr by default now. Tracebacks will now be captured amongst other logging statements.

  5. ⬆️ PyTorch 1.11 Support

    We've tested Composer on PyTorch 1.11. Go ahead and upgrade your dependencies!

  6. ✅ Checkpointing

    We changed the checkpoint format to store the underlying model, not the DistributedDataParallel wrapped model. If you're using Composer to read checkpoints, there's nothing to change. But if you're reading Composer checkpoints manually, note that the module checkpoints will be formatted differently.

    In addition, we changed the checkpointing argument names for the trainer.

    • The new parameters save_artifact_name and save_latest_artifact_name allow checkpoints to be saved directly to artifact stores.
    • The new parameter save_num_checkpoints_to_keep helps preserve local disk storage by automatically removing old checkpoints.
    • load_path replaces load_path_format.
    • save_name replaces save_path_format.
    • save_latest_filename replaces save_latest_format.
  7. 🏎️ Profiling

    We added support for custom scheduling functions and re-designed how the profiler saves traces. Each profiling cycle will now have its own trace file. Trace merging happens automatically throughout the training process. Long-running profiling is now possible without the long wait at the end of training for the trace merge.

    As part of this refactor, the profiler arguments have changed:

    • prof_trace_handlers replaces prof_event_handlers.
    • prof_schedule replaces prof_skip_first, prof_wait, prof_warmup, prof_active, and prof_repeat. See the cyclic schedule function.
    • torch_prof_folder replaces torch_profiler_trace_dir
    • The new arguments torch_prof_filename, torch_prof_artifact_name, torch_prof_overwrite, and torch_prof_num_traces_to_keep allow for customization on how PyTorch Profiler traces are saved.
  8. 🏗️ TorchVision Model Architectures

    We switched our vision models to use the TorchVision model architecture implementations where possible.

Bug Fixes

  • Fixed a bug with MixUp and gradient accumulation
  • Fixed numerous issues with the Composer launch script for distributed training. Composer v0.6.0 includes environment variable support, better defaults and warings, and proper handling of crashed processes.

Changelog

Read more

Release version v0.5.0

16 Mar 14:02
Compare
Choose a tag to compare

We are excited to share Composer v0.5, a library of speed-up methods for efficient neural network training. This release features:

  • Revamped checkpointing API based on community feedback
  • New baselines: ResNet34-SSD, GPT-3, and Vision Transformers
  • Additional improvements to our documentation
  • Support for bfloat16
  • Streaming dataset support
  • Unified functional API for our algorithms

Highlights

Checkpointing API

Checkpointing models are now a Callback, so that users can easily write and add their own callbacks. The callback is automatically appended if a save_folder is provided to the Trainer.

trainer = Trainer(
    model=model,
    algorithms=algorithms,
    save_folder="checkpoints",
    save_interval="1ep"
)

Alternatively, CheckpointSaver can be directly added as a callback:

trainer = Trainer(..., callbacks=[
    CheckpointSaver(
        save_folder='checkpoints',
        name_format="ep{epoch}-ba{batch}/rank_{rank}",
        save_latest_format="latest/rank_{rank}",
        save_interval="1ep",
        weights_only=False,
    )
])

Subclass from CheckpointSaver to add your own logic for saving the best model, or saving at specific intervals. Thanks to @mansheej @siriuslee and other users for their feedback.

bloat16

We've added experimental support for bfloat16, which can be provided via the precision argument to the Trainer:

trainer = Trainer(
    ...,
    precision="bfloat16"
)

Streaming datasets

We've added support for fast streaming datasets. For NLP-based datasets such as C4, we use the HuggingFace datasets backend, and add dataset-specific shuffling, tokenization , and grouping on-the-fly. To support data parallel training, we added specific sharding logic for efficiency. See C4Datasets for more details.

Vision streaming datasets are supported via a patched version of the webdatasets package, and added support for data sharding by workers for fast augmentations. See composer.datasets.webdataset for more details.

Baseline GPT-3, ResNet34-SSD, and Vision Transformer benchmarks

Configurations for GPT-3-like models ranging from 125m to 760m parameters are now released, and use DeepSpeed Zero Stage 0 for memory-efficient training.

We've also added the Single Shot Detection (SSD) model (Wei et al, 2016) with a ResNet34 backbone, based on the MLPerf reference implementation.

Our first Vision Transformer benchmark is the ViT-S/16 model from Touvron et al, 2021, and based on the vit-pytorch package.

See below for the full details:

What's Changed

Read more

Release Version 0.4.0

01 Mar 02:34
Compare
Choose a tag to compare

What's Changed

Read more

Release Version 0.3.1

01 Dec 00:27
d17e69f
Compare
Choose a tag to compare

Hotfix

Hotfix to fix installation of the composer package