Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entering fails if Nouveau/Nova is being used while the proprietary NVIDIA driver is still installed #1573

Closed
picsel2 opened this issue Oct 24, 2024 · 8 comments
Labels
1. Bug Something isn't working

Comments

@picsel2
Copy link

picsel2 commented Oct 24, 2024

Describe the bug
A clear and concise description of what the bug is. If possible, re-run the command(s) with --log-level debug and put the output here.

I can not enter the toolbox:

$ toolbox --log-level debug enter
DEBU Running as real user ID 1000                 
DEBU Resolved absolute path to the executable as /usr/bin/toolbox 
DEBU Running on a cgroups v2 host                 
DEBU Looking up sub-GID and sub-UID ranges for user sebastian 
DEBU TOOLBX_DELAY_ENTRY_POINT is                  
DEBU TOOLBX_FAIL_ENTRY_POINT is                   
DEBU TOOLBOX_PATH is /usr/bin/toolbox             
DEBU Migrating to newer Podman                    
DEBU Toolbx config directory is /home/sebastian/.config/toolbox 
DEBU Current Podman version is 5.2.3              
DEBU Creating runtime directory /run/user/1000/toolbox 
DEBU Old Podman version is 5.2.3                  
DEBU Migration not needed: Podman version 5.2.3 is unchanged 
DEBU Setting up configuration                     
DEBU Setting up configuration: file /home/sebastian/.config/containers/toolbox.conf not found 
DEBU Resolving container and image names          
DEBU Container: ''                                
DEBU Distribution (CLI): ''                       
DEBU Image (CLI): ''                              
DEBU Release (CLI): ''                            
DEBU Resolved container and image names           
DEBU Container: 'fedora-toolbox-40'               
DEBU Image: 'fedora-toolbox:40'                   
DEBU Release: '40'                                
DEBU Resolving container and image names          
DEBU Container: ''                                
DEBU Distribution (CLI): ''                       
DEBU Image (CLI): ''                              
DEBU Release (CLI): ''                            
DEBU Resolved container and image names           
DEBU Container: 'fedora-toolbox-40'               
DEBU Image: 'fedora-toolbox:40'                   
DEBU Release: '40'                                
DEBU Checking if container fedora-toolbox-40 exists 
DEBU Inspecting container fedora-toolbox-40       
DEBU Entry point of container fedora-toolbox-40 is toolbox (PID=0) 
DEBU Inspecting mounts of container fedora-toolbox-40 
DEBU Generating Container Device Interface for NVIDIA 
DEBU Generating Container Device Interface for NVIDIA: failed to initialize NVML: Driver Not Loaded 
Error: failed to initialize NVIDIA Management Library

Steps how to reproduce the behaviour

  1. Have nouveau driver loaded.
  2. Execute toolbox enter.
  3. See error.

Expected behaviour

toolbox runs the container environment.

Actual behaviour

toolbox does not enter the container and exits with code 1.

Screenshots

Not applicable.

Output of toolbox --version (v0.0.90+)

toolbox version 0.0.99.6

Toolbx package info (rpm -q toolbox)

toolbox-0.0.99.6-1.fc40.x86_64

Output of podman version

Client:       Podman Engine
Version:      5.2.3
API Version:  5.2.3
Go Version:   go1.22.7
Built:        Tue Sep 24 02:00:00 2024
OS/Arch:      linux/amd64

Podman package info (rpm -q podman)

podman-5.2.3-1.fc40.x86_64

Info about your OS

Fedora Workstation 40 (Wayland session)

Additional context
I do not have the nvidia driver loaded:

Output of lsmod
$ lsmod
Module                  Size  Used by
uinput                 20480  0
snd_seq_dummy          12288  0
snd_hrtimer            12288  1
snd_seq_midi           24576  0
snd_seq_midi_event     16384  1 snd_seq_midi
lm92                   20480  0
sunrpc                897024  1
binfmt_misc            28672  1
vfat                   24576  1
fat                   118784  1 vfat
snd_hda_codec_realtek   221184  1
snd_hda_codec_generic   131072  1 snd_hda_codec_realtek
snd_hda_scodec_component    20480  1 snd_hda_codec_realtek
snd_hda_codec_hdmi    102400  1
snd_hda_intel          69632  3
snd_intel_dspcfg       40960  1 snd_hda_intel
snd_intel_sdw_acpi     16384  1 snd_intel_dspcfg
snd_usb_audio         606208  5
snd_hda_codec         225280  4 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec_realtek
snd_usbmidi_lib        57344  1 snd_usb_audio
snd_hda_core          155648  5 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek
snd_ump                40960  1 snd_usb_audio
snd_rawmidi            57344  3 snd_seq_midi,snd_usbmidi_lib,snd_ump
intel_rapl_msr         20480  0
amd_atl                69632  1
mc                     90112  1 snd_usb_audio
intel_rapl_common      61440  1 intel_rapl_msr
snd_hwdep              20480  2 snd_usb_audio,snd_hda_codec
snd_seq               135168  9 snd_seq_midi,snd_seq_midi_event,snd_seq_dummy
edac_mce_amd           40960  0
snd_seq_device         16384  4 snd_seq,snd_seq_midi,snd_ump,snd_rawmidi
kvm_amd               249856  0
snd_pcm               200704  5 snd_hda_codec_hdmi,snd_hda_intel,snd_usb_audio,snd_hda_codec,snd_hda_core
eeepc_wmi              12288  0
ee1004                 16384  0
asus_wmi              102400  1 eeepc_wmi
snd_timer              53248  3 snd_seq,snd_hrtimer,snd_pcm
kvm                  1449984  1 kvm_amd
sparse_keymap          12288  1 asus_wmi
platform_profile       12288  1 asus_wmi
rfkill                 40960  4 asus_wmi
snd                   163840  32 snd_hda_codec_generic,snd_seq,snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_usb_audio,snd_usbmidi_lib,snd_hda_codec,snd_hda_codec_realtek,snd_timer,snd_ump,snd_pcm,snd_rawmidi
r8169                 131072  0
i2c_piix4              40960  0
wmi_bmof               12288  0
rapl                   20480  0
pcspkr                 12288  0
k10temp                16384  0
soundcore              16384  1 snd
i2c_smbus              20480  1 i2c_piix4
realtek                45056  1
gpio_amdpt             16384  0
gpio_generic           20480  1 gpio_amdpt
loop                   45056  0
dm_multipath           53248  0
nfnetlink              24576  1
zram                   49152  1
dm_crypt               69632  2
hid_logitech_hidpp     81920  0
nouveau              3923968  26
drm_ttm_helper         16384  1 nouveau
crct10dif_pclmul       12288  1
crc32_pclmul           12288  0
ttm                   114688  2 drm_ttm_helper,nouveau
crc32c_intel           16384  4
polyval_clmulni        12288  0
polyval_generic        12288  1 polyval_clmulni
video                  81920  2 asus_wmi,nouveau
gpu_sched              65536  1 nouveau
i2c_algo_bit           20480  1 nouveau
drm_gpuvm              45056  1 nouveau
ghash_clmulni_intel    16384  0
drm_exec               12288  2 drm_gpuvm,nouveau
mxm_wmi                12288  1 nouveau
nvme                   69632  3
sha512_ssse3           53248  0
drm_display_helper    290816  1 nouveau
sha256_ssse3           36864  0
nvme_core             253952  4 nvme
sha1_ssse3             32768  0
cec                    98304  1 drm_display_helper
sp5100_tco             20480  0
nvme_auth              28672  1 nvme_core
wmi                    36864  5 video,asus_wmi,wmi_bmof,mxm_wmi,nouveau
hid_logitech_dj        45056  0
serio_raw              20480  0
scsi_dh_rdac           16384  0
scsi_dh_emc            12288  0
scsi_dh_alua           28672  0
ip6_tables             28672  0
ip_tables              28672  0
fuse                  233472  5
@picsel2 picsel2 added the 1. Bug Something isn't working label Oct 24, 2024
@picsel2
Copy link
Author

picsel2 commented Oct 24, 2024

I found the same error message here: #1572 (comment)
Might be related.

@picsel2
Copy link
Author

picsel2 commented Oct 24, 2024

My workaround is to switch the driver to proprietary nvidia.

@debarshiray
Copy link
Member

Oops! Sorry about that. I will look into this.

@debarshiray
Copy link
Member

It seems to me that this only happens if the use of Nouveau is forced, while the proprietary NVIDIA driver is still installed, particularly if libnvidia-ml.so.1 is still present.

I reinstalled Fedora Workstation on my machine with a NVIDIA GPU, with only Nouveau, and Toolbx works as expected. I have been regularly testing with the proprietary NVIDIA driver installed and enabled. I have never tested the situation where the proprietary driver is installed but Nouveau is being forced, and looking at the Toolbx code I can see how this can fail to work.

I can imagine different ways to force the use of Nouveau, but I wonder how you did so that I can reproduce your exact situation, and if there's a supported way to do it.

@picsel2
Copy link
Author

picsel2 commented Oct 24, 2024

You're right! I had both installed and I noticed somewhat misconfigured kernel arguments while changing to the proprietary driver. The setting in /etc/default/grub was

GRUB_CMDLINE_LINUX="rhgb quiet amd_pstate=active rd.driver.blacklist=nouveau modprobe.blacklist=nouveau"

It was missing the argument nvidia-drm.modeset=1 in contrast to the instructions of the RPM Fusion Howto.

@debarshiray debarshiray changed the title Entering toolbox fails with nouveau driver Entering fails if Nouveau is being used while the proprietary NVIDIA driver is still installed Oct 24, 2024
debarshiray added a commit to debarshiray/toolbox that referenced this issue Oct 24, 2024
If the proprietary NVIDIA driver is installed, particularly
libnvidia-ml.so.1, but the kernel driver is not being used, then 'enter'
fails with:
  $ toolbox enter
  Error: failed to initialize NVIDIA Management Library

This was tested on Fedora 39 Workstation with the proprietary NVIDIA
driver from RPM Fusion, which makes it possible to easily disable the
driver without uninstalling it [1].

Note that, with and without this change, there's a delay of a few
seconds inside nvmlInit() from the NVIDIA Management Library.

[1] https://rpmfusion.org/Howto/NVIDIA

containers#1573
@debarshiray
Copy link
Member

You're right! I had both installed and I noticed somewhat misconfigured kernel arguments while changing to the proprietary driver.

Thanks for the confirmation!

Could you please test this pull request: #1575 ?

@picsel2
Copy link
Author

picsel2 commented Oct 26, 2024

The PR fixes it. Thanks!

@debarshiray
Copy link
Member

The PR fixes it. Thanks!

Thanks for the confirmation, and your contribution to Toolbx!

@debarshiray debarshiray changed the title Entering fails if Nouveau is being used while the proprietary NVIDIA driver is still installed Entering fails if Nouveau/Nova is being used while the proprietary NVIDIA driver is still installed Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1. Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants