You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If anything at all has the NVIDIA card open when that command is invoked, both the CLI command and the daemon will hang forever, and the daemon will become unkillable. This is true even in compute mode.
The kernel log says something to the effect of
Feb 26 18:22:12 adder kernel: NVRM: Attempting to remove device 0000:01:00.0 with non-zero usage count!
In my case, for the command to ever succeed, I needed to:
stop the ollama service
stop the nvidia-persistenced.service
stop the nvidia-powerd.service
kill my user session's Xwayland
How did I know I had to do this? lsof /dev/nvidia0.
I recommend your daemon implement a preemptive check doing exactly that command, and then erroring out notifying the user that so-and-so processes have the NVIDIA card open, and therefore it is not possible to power the card off.
Also recommend better error handling when the daemon is rmmoding things, checking for errors and also checking for whether the operation has hung.
The text was updated successfully, but these errors were encountered:
If anything at all has the NVIDIA card open when that command is invoked, both the CLI command and the daemon will hang forever, and the daemon will become unkillable. This is true even in
compute
mode.The kernel log says something to the effect of
In my case, for the command to ever succeed, I needed to:
How did I know I had to do this?
lsof /dev/nvidia0
.I recommend your daemon implement a preemptive check doing exactly that command, and then erroring out notifying the user that so-and-so processes have the NVIDIA card open, and therefore it is not possible to power the card off.
Also recommend better error handling when the daemon is
rmmod
ing things, checking for errors and also checking for whether the operation has hung.The text was updated successfully, but these errors were encountered: