Getting NaNs... manual initialization required? #4

kevinjohncutler · 2023-01-26T03:54:56Z

@pvjosue I tried swapping this in for conv2D and conv3D in a Unet. I am getting the correct shape but all NaNs as output. Do kernel_initializer and bias_initializer need to be set manually?

pvjosue · 2023-01-26T07:06:06Z

Yes, i recently observed this as well, and will investigate.
My guess is there something changed in the latest pytorch release.
Thanks

DavidRavnsborg · 2023-02-08T07:32:07Z

@pvjosue Have you found a solution to this with the latest version of pytorch? Did they deprecate this ability in the latest release? This is exactly the type of package I was looking for. I would be willing to use an older version of pytorch to use this.

pvjosue · 2023-02-08T12:50:56Z

Let's see, with user base initialization, it seems to work out of the box.
GPU Ubuntu:

CPU Ubuntu:

The problem is the default initialization, which I could fix for convND: 2b0e4db
I'll leave the issue open as a solution for the transposed conv is still missing.
Thanks for letting me know :)

st0nedB · 2023-03-23T10:25:38Z

For me, the culprit was the bias. In the initialization of convNd the bias

if use_bias:
    self.bias = nn.Parameter(torch.Tensor(out_channels))
else:
    self.register_parameter('bias', None)

According to my understanding, one should not use torch.Tensor to initialize a tensor as it might access unallocated or improperly initialized memory. A simple example

>>> import torch
>>> import torch.nn as nn
>>> bias = nn.Parameter(torch.Tensor(10))
>>> type(bias)
<class 'torch.nn.parameter.Parameter'>
>>> bias.data
tensor([-1.0836e+10,  2.6007e-36,  6.4800e+24,  4.5593e-41,  4.5549e+24,
         4.5593e-41,  1.8760e-16,         nan,  6.4629e+24,  4.5593e-41])

After switching it to

if use_bias:
    self.bias = nn.Parameter(torch.zeros(out_channels))
else:
    self.register_parameter('bias', None)

everything works as expected. It is probably not an optimal initialization, but if I understand correctly, a user can pass a suitable one via the bias_initializer keyword argument.

st0nedB · 2023-03-23T10:33:05Z

I added a (very simple) fix. Please note it might not be optimal to use for everyone, but it suffices in my network.
We can have discussions on this in the merge-request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting NaNs... manual initialization required? #4

Getting NaNs... manual initialization required? #4

kevinjohncutler commented Jan 26, 2023

pvjosue commented Jan 26, 2023

DavidRavnsborg commented Feb 8, 2023

pvjosue commented Feb 8, 2023

st0nedB commented Mar 23, 2023

st0nedB commented Mar 23, 2023 •

edited

Loading

Getting NaNs... manual initialization required? #4

Getting NaNs... manual initialization required? #4

Comments

kevinjohncutler commented Jan 26, 2023

pvjosue commented Jan 26, 2023

DavidRavnsborg commented Feb 8, 2023

pvjosue commented Feb 8, 2023

st0nedB commented Mar 23, 2023

st0nedB commented Mar 23, 2023 • edited Loading

st0nedB commented Mar 23, 2023 •

edited

Loading