Skip to content

kishoryd/DistributedTraining

Repository files navigation

Deep Learning Training Examples with PyTorch

This repository provides a comprehensive guide and practical examples for training deep learning models using PyTorch across various parallelism strategies. Whether you are working on single-GPU training or scaling to multi-GPU setups with Distributed Data Parallel (DDP) or Fully Sharded Data Parallel (FSDP), these examples will guide you through the process.


Contents

01. Introduction to Deep Learning

  • Foundational concepts of deep learning and PyTorch.
  • HPC Environment Setup:
    • Using SLURM for job scheduling: Submitting and managing training jobs.
    • Loading necessary modules: Configuring PyTorch and CUDA on an HPC cluster.

02. Single-GPU Training

  • Efficiently training models on a single GPU.
  • Optimizations:
    • DALI: Efficient data loading using NVIDIA Data Loading Library.
    • AMP: Automatic Mixed Precision for faster training with reduced memory consumption.

03. Multi-GPU Training with Data Parallelism (DP)

  • Scaling models across multiple GPUs using torch.nn.DataParallel.
  • Key Considerations:
    • Understanding inter-GPU communication overhead.
    • Differences between DP and DDP for better performance.

04. Distributed Data Parallel (DDP) Training

  • Leveraging torch.nn.parallel.DistributedDataParallel for efficient multi-GPU training.
  • Setting up process groups and distributed samplers
  • Advantages of DDP Over DP:
    • Lower communication overhead.
    • Better scalability across multiple nodes.

05. Fully Sharded Data Parallel (FSDP) Training

  • Training large models with memory efficiency using Fully Sharded Data Parallel (FSDP).
  • Fine-tuning large-scale models like CodeLlama.

06. Containerized Training with Enroot and NGC Containers

  • Running PyTorch training using NVIDIA Enroot and NGC Containers on HPC.
  • Topics Covered:
    • Importing and running NGC PyTorch containers with Enroot.
    • Running single and multi-GPU PyTorch workloads inside containers.
    • Using SLURM to launch containerized PyTorch jobs on GPU clusters.

Resources


Note

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published