Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-fatal errors on tensorflow import with TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1 #21790

Open
lexming opened this issue Nov 6, 2024 · 2 comments
Milestone

Comments

@lexming
Copy link
Contributor

lexming commented Nov 6, 2024

We are seeing the following errors after a simple import tensorflow with TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1:

2024-10-25 14:56:51.496918: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-25 14:56:51.496995: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-25 14:56:51.497709: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

These are non-fatal, the import completes successfully and TF seems to work normally after these error messages. That's why sanity checks after installation with EB do pass.

Apparently it's a rather common issue (tensorflow/tensorflow#62075) caused by some version mismatch between TF and CUDA. Upstream only tests TF v2.15 with CUDA 12.2, while we use CUDA 12.1. So that might be the reason of these errors.

Does anybody else see it in their systems? if it is caused by a version mismatch, we should see it across the board in EB. Otherwise it might be something else.

@boegel boegel added this to the 4.x milestone Nov 6, 2024
@boegel
Copy link
Member

boegel commented Nov 6, 2024

I can confirm I'm seeing this same issue:

$ module load TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1
$ python -c 'import tensorflow'
2024-11-06 13:57:10.160977: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-06 13:57:14.228266: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-06 13:57:14.228320: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-06 13:57:14.923172: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

These warning sure are a bit alarming, though probably harmless.
Is it worth trying to figure out how to avoid them?

@lexming
Copy link
Contributor Author

lexming commented Nov 7, 2024

Is it worth trying to figure out how to avoid them?

I don't think so, given that it seems harmless. Since you also see those "errors", then the issue is probably caused by using CUDA 12.1 instead of 12.2. And we cannot change that at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants