- Download and install NVIDIA CUDA v12.8 or a newer version from the official NVIDIA website.
- Download the CUDNN version compatible with CUDA v12 from the NVIDIA CUDNN download page. An account is required.
-
Extract the
cudnn.zip
file and copy thebin
,include
, andlib
folders to the following directory:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8
-
Copy the files from the following directory:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\extras\visual_studio_integration\MSBuildExtensions
To this directory:
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Microsoft\VC\v170\BuildCustomizations
- For CUDA support in
llama-cpp-python
, refer to the official llama-cpp-python documentation.
-
Set the environment variable for the CUDA compiler (
nvcc.exe
):$env:CUDACXX="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\nvcc.exe"
-
Set the CMake arguments for the build process:
set CMAKE_ARGS=-DGGML_CUDA=on -DCMAKE_GENERATOR_TOOLSET="cuda=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8"
-
Set the environment variable for the CUDA toolkit directory:
$env:CudaToolkitDir="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\"
-
Install or force-reinstall the
llama-cpp-python
package (This may take 30-50 minutes to compile):pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir --verbose
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126