Conda-Installation-Tutorial-Windows10 (for Linux (Ubuntu18), click here) (for Pytorch distributed GPU training with NCCL (as well as by Accelerate class), click here)

This is a tutorial for installing CUDA (v11.8) and cuDNN (8.6.9) to enable programming Pytorch with GPU. It also mentioned about the solution of unabling for Pytorch to detect the CUDA core.

Claim: This tutorial was done when I came back from abroad at NAU. I found my computer like a stranger so I devoted myself to re-install the whole system. Thus the CUDA environment needs to be re-configured, where I have met several obstacles while doing this although it's my third (or fourth time...? I do not remember) to do this. So I decided to do a full-scope tutorial to record the problem I met and its corresponding solution which may help me in the future and others.

Suggestion: Install the CUDA first then install the corresponding CUDA-compatible Pytorch

1. Check the compatibility:

Check whether your GPU is compatible with CUDA (and the supported CUDA version) at https://developer.nvidia.com/cuda-gpus, and update your driver.
Confirm the version of CUDA that you want and can install with (you can search for a specific version). In this tutorial, it's v11.8.
Find the compatible version of cuDNN (plug-in for optimizing AI training) at https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/windows-x86_64/. In this tutorial, it's 8.6.0.

2. Install CUDA:

If the high version fails to install, you can try the older version. Here I check all the checkboxes.

3. Verify the installation of CUDA:

First go the settings -> edit system environment variable (path) -> Environment Variables...
-> Under System variables, ensure the existence of these two paths as below:

-> Also under this column, find Path, double click to open, and add these two paths if they did not exist:

Second, to verify the system has already detected installed CUDA, type the command nvcc --version in the command prompt, the displayed version should match with the CUDA version you just installed.

4. Add Plug-in: cuDNN to CUDA:

Copy all the files (folders) of the downloaded cuDNN zip file that is compatible with your CUDA version, and paste them under the CUDA folder (in my case, it's C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\) to finish those paste action at one time.

5. Installation of compatible Pytorch:

This step was where I got stuck and spent most of my time working it out. Run the similar command to install cuda corresponding version of Pytorch: pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118 (here cu118 indicates the CUDA version is 11.8). The proper size of the main package of this installation should be near 2.4GB.

6. Verification:

import torch
torch.cuda.is_available()

should return True value

7. Some torch functions that may helps debugging if error occurs:

Click here

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
images		images
README.md		README.md
README_nccl_distributed_compute.md		README_nccl_distributed_compute.md
README_virtual_env_Ubuntu18_server.md		README_virtual_env_Ubuntu18_server.md
debug.ipynb		debug.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conda-Installation-Tutorial-Windows10 (for Linux (Ubuntu18), click here) (for Pytorch distributed GPU training with NCCL (as well as by Accelerate class), click here)

1. Check the compatibility:

2. Install CUDA:

3. Verify the installation of CUDA:

4. Add Plug-in: cuDNN to CUDA:

5. Installation of compatible Pytorch:

6. Verification:

7. Some torch functions that may helps debugging if error occurs:

About

Releases

Packages

Languages

TyBruceChen/Tutorial-Conda-cuDNN-NCCL-installation-for-Pytorch

Folders and files

Latest commit

History

Repository files navigation

Conda-Installation-Tutorial-Windows10 (for Linux (Ubuntu18), click here) (for Pytorch distributed GPU training with NCCL (as well as by Accelerate class), click here)

1. Check the compatibility:

2. Install CUDA:

3. Verify the installation of CUDA:

4. Add Plug-in: cuDNN to CUDA:

5. Installation of compatible Pytorch:

6. Verification:

7. Some torch functions that may helps debugging if error occurs:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages