Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting NaNs... manual initialization required? #4

Open
kevinjohncutler opened this issue Jan 26, 2023 · 5 comments
Open

Getting NaNs... manual initialization required? #4

kevinjohncutler opened this issue Jan 26, 2023 · 5 comments

Comments

@kevinjohncutler
Copy link

@pvjosue I tried swapping this in for conv2D and conv3D in a Unet. I am getting the correct shape but all NaNs as output. Do kernel_initializer and bias_initializer need to be set manually?

@pvjosue
Copy link
Owner

pvjosue commented Jan 26, 2023

Yes, i recently observed this as well, and will investigate.
My guess is there something changed in the latest pytorch release.
Thanks

@DavidRavnsborg
Copy link

@pvjosue Have you found a solution to this with the latest version of pytorch? Did they deprecate this ability in the latest release? This is exactly the type of package I was looking for. I would be willing to use an older version of pytorch to use this.

@pvjosue
Copy link
Owner

pvjosue commented Feb 8, 2023

Let's see, with user base initialization, it seems to work out of the box.
GPU Ubuntu:
image

CPU Ubuntu:
image

The problem is the default initialization, which I could fix for convND: 2b0e4db
I'll leave the issue open as a solution for the transposed conv is still missing.
Thanks for letting me know :)

@st0nedB
Copy link
Contributor

st0nedB commented Mar 23, 2023

For me, the culprit was the bias. In the initialization of convNd the bias

if use_bias:
    self.bias = nn.Parameter(torch.Tensor(out_channels))
else:
    self.register_parameter('bias', None)

According to my understanding, one should not use torch.Tensor to initialize a tensor as it might access unallocated or improperly initialized memory. A simple example

>>> import torch
>>> import torch.nn as nn
>>> bias = nn.Parameter(torch.Tensor(10))
>>> type(bias)
<class 'torch.nn.parameter.Parameter'>
>>> bias.data
tensor([-1.0836e+10,  2.6007e-36,  6.4800e+24,  4.5593e-41,  4.5549e+24,
         4.5593e-41,  1.8760e-16,         nan,  6.4629e+24,  4.5593e-41])

After switching it to

if use_bias:
    self.bias = nn.Parameter(torch.zeros(out_channels))
else:
    self.register_parameter('bias', None)

everything works as expected. It is probably not an optimal initialization, but if I understand correctly, a user can pass a suitable one via the bias_initializer keyword argument.

@st0nedB
Copy link
Contributor

st0nedB commented Mar 23, 2023

I added a (very simple) fix. Please note it might not be optimal to use for everyone, but it suffices in my network.
We can have discussions on this in the merge-request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants