Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA backend uses significantly more memory than OpenCL #6835

Open
adamreeve opened this issue Feb 19, 2025 · 1 comment
Open

CUDA backend uses significantly more memory than OpenCL #6835

adamreeve opened this issue Feb 19, 2025 · 1 comment
Labels

Comments

@adamreeve
Copy link

adamreeve commented Feb 19, 2025

Hi, we want to migrate to the CUDA backend from the GPU/OpenCL backend but have found that the CUDA backend uses significantly more GPU memory than the OpenCL backend, leading to errors running out of GPU memory in some cases.

Eg. this test code trains 100 trees and monitors the maximum GPU memory used:

import numpy as np
from lightgbm import LGBMRegressor
import threading
import time
import pynvml


def monitor_gpu_mem(cancel: threading.Event):
    pynvml.nvmlInit()
    handle = pynvml.nvmlDeviceGetHandleByIndex(0)
    max_mem = 0
    while not cancel.is_set():
        time.sleep(1)
        mem = sum(
                proc.usedGpuMemory // (1024 * 1024)
                for proc in pynvml.nvmlDeviceGetComputeRunningProcesses(handle))
        max_mem = max(mem, max_mem)

    print(f"Max GPU mem = {max_mem} MiB")

num_cols = 1000
num_rows = 2_000_000

rng = np.random.default_rng(0)
X = rng.random((num_rows, num_cols), dtype=np.float32)
y = rng.random((num_rows,), dtype=np.float32)

params = {
    'n_estimators': 100,
    'learning_rate': 0.05,
    'subsample_freq': 6,
    'subsample': 0.5,
    'colsample_bytree': 0.8,
    'colsample_bynode': 0.8,
    'num_leaves': 64,
    'max_depth': 8,
    'reg_alpha': 9,
    'reg_lambda': 3.5,
    'min_child_samples': 100,
    'max_bin': 63,
    'enable_sparse': False,
    'device_type': "cuda",
    'gpu_use_dp': False,
    'gpu_platform_id': 0,
    'gpu_device_id': 0,
    'num_gpu': 1,
}

cancel = threading.Event()
gpu_monitor_thread = threading.Thread(target=monitor_gpu_mem, args=(cancel, ))
gpu_monitor_thread.start()

try:
    model = LGBMRegressor(**params)
    model.fit(X, y)

finally:
    cancel.set()
    gpu_monitor_thread.join()

The training dataset has 5,000,000,000 total floats, and when quantized these should be able to be stored in one byte, requiring 5 GB for the full training dataset.
I built LightGBM at version 4.6.0 with CUDA and OpenCL support.
Running this using an A100 GPU uses 11,580 MiB, but if I change device_type to gpu, it uses 7,812 MiB.

I tested the effect of the max_bin parameter. The memory used is the same at max_bin = 255, but if I reduce this to 15, OpenCL only requires 2,890 MiB, but CUDA still needs 11,580 MiB. I tested increasing this beyond 256 which raises an error with the OpenCL backend, but with the CUDA backend the GPU memory jumped to 20,538 MiB. I would have expected an increase in 5 GB if an extra byte was used for each training value, but it seems like there's nearly an additional 2 bytes per training value?

I wondered if the subsample gets materialised and stored separately on the GPU, but changing the subsample size doesn't seem to affect GPU memory usage.

I also noticed that the CUDA backend only supports double precision, but the OpenCL backend uses single precision floats by default. But changing OpenCL to use double precision with gpu_use_dp only slightly increased memory use.

Is this expected behaviour and is there a known reason why the CUDA backend uses so much more memory?

@shiyu1994
Copy link
Collaborator

It is true that with device_type=cuda, more memory is used. Because we adopt a more efficient way to compute the histograms with device_type=cuda. As a cost, we store a separate copy of the discretized data in GPU memory, which should double the memory cost for discretized data storage.

We may consider enabling a memory-efficient mode (but perhaps slightly slower) version in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants