CUDA backend uses significantly more memory than OpenCL #6835

adamreeve · 2025-02-19T09:53:31Z

Hi, we want to migrate to the CUDA backend from the GPU/OpenCL backend but have found that the CUDA backend uses significantly more GPU memory than the OpenCL backend, leading to errors running out of GPU memory in some cases.

Eg. this test code trains 100 trees and monitors the maximum GPU memory used:

import numpy as np
from lightgbm import LGBMRegressor
import threading
import time
import pynvml


def monitor_gpu_mem(cancel: threading.Event):
    pynvml.nvmlInit()
    handle = pynvml.nvmlDeviceGetHandleByIndex(0)
    max_mem = 0
    while not cancel.is_set():
        time.sleep(1)
        mem = sum(
                proc.usedGpuMemory // (1024 * 1024)
                for proc in pynvml.nvmlDeviceGetComputeRunningProcesses(handle))
        max_mem = max(mem, max_mem)

    print(f"Max GPU mem = {max_mem} MiB")

num_cols = 1000
num_rows = 2_000_000

rng = np.random.default_rng(0)
X = rng.random((num_rows, num_cols), dtype=np.float32)
y = rng.random((num_rows,), dtype=np.float32)

params = {
    'n_estimators': 100,
    'learning_rate': 0.05,
    'subsample_freq': 6,
    'subsample': 0.5,
    'colsample_bytree': 0.8,
    'colsample_bynode': 0.8,
    'num_leaves': 64,
    'max_depth': 8,
    'reg_alpha': 9,
    'reg_lambda': 3.5,
    'min_child_samples': 100,
    'max_bin': 63,
    'enable_sparse': False,
    'device_type': "cuda",
    'gpu_use_dp': False,
    'gpu_platform_id': 0,
    'gpu_device_id': 0,
    'num_gpu': 1,
}

cancel = threading.Event()
gpu_monitor_thread = threading.Thread(target=monitor_gpu_mem, args=(cancel, ))
gpu_monitor_thread.start()

try:
    model = LGBMRegressor(**params)
    model.fit(X, y)

finally:
    cancel.set()
    gpu_monitor_thread.join()

The training dataset has 5,000,000,000 total floats, and when quantized these should be able to be stored in one byte, requiring 5 GB for the full training dataset.
I built LightGBM at version 4.6.0 with CUDA and OpenCL support.
Running this using an A100 GPU uses 11,580 MiB, but if I change device_type to gpu, it uses 7,812 MiB.

I tested the effect of the max_bin parameter. The memory used is the same at max_bin = 255, but if I reduce this to 15, OpenCL only requires 2,890 MiB, but CUDA still needs 11,580 MiB. I tested increasing this beyond 256 which raises an error with the OpenCL backend, but with the CUDA backend the GPU memory jumped to 20,538 MiB. I would have expected an increase in 5 GB if an extra byte was used for each training value, but it seems like there's nearly an additional 2 bytes per training value?

I wondered if the subsample gets materialised and stored separately on the GPU, but changing the subsample size doesn't seem to affect GPU memory usage.

I also noticed that the CUDA backend only supports double precision, but the OpenCL backend uses single precision floats by default. But changing OpenCL to use double precision with gpu_use_dp only slightly increased memory use.

Is this expected behaviour and is there a known reason why the CUDA backend uses so much more memory?

The text was updated successfully, but these errors were encountered:

shiyu1994 · 2025-02-20T15:03:09Z

It is true that with device_type=cuda, more memory is used. Because we adopt a more efficient way to compute the histograms with device_type=cuda. As a cost, we store a separate copy of the discretized data in GPU memory, which should double the memory cost for discretized data storage.

We may consider enabling a memory-efficient mode (but perhaps slightly slower) version in the future.

jameslamb added the question label Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA backend uses significantly more memory than OpenCL #6835

CUDA backend uses significantly more memory than OpenCL #6835

adamreeve commented Feb 19, 2025 •

edited

Loading

shiyu1994 commented Feb 20, 2025

CUDA backend uses significantly more memory than OpenCL #6835

CUDA backend uses significantly more memory than OpenCL #6835

Comments

adamreeve commented Feb 19, 2025 • edited Loading

shiyu1994 commented Feb 20, 2025

adamreeve commented Feb 19, 2025 •

edited

Loading