You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have read your paper and was very excited about the results you report. However, after having used your repo, I am left with some concerns:
The code apparently does not work for mixed precision training. Are there any plans to extend it to work for this? As you can imagine, mixed precision training is quite important when working with limited resources.
It appears memory usage during training is substantially higher for RedNet 26 as compared to ResNet26. Is this expected behavior? I don't believe this was mentioned in the paper, so I would just like to make sure that this is not an issue from my part.
Thanks a lot in advance.
The text was updated successfully, but these errors were encountered:
While the actual involution unfold / multiply does not work with mixed precision, the rest of the autocastable ops do. So if you cast to fp32 before calling the CuPy / CUDA part of involution implementation, you will at least have partial mixed precision support.
Assuming memory efficient implementation of the unfold/multiply part of involution, we can compare a ResNet bottleneck against RedNet bottleneck. RedNet bottleneck has two additional 1x1 conv (reduce / create kernel), one additional ReLU and BN between the the additional 1x1 conv. So I would expect RedNet bottlekneck to use more memory since it has more operations, even though some of the operations are at 4x channel reduction. I suspect if you use a reduction of 8x, you will see Involution use similar/less memory than normal residual bottleneck.
I have read your paper and was very excited about the results you report. However, after having used your repo, I am left with some concerns:
Thanks a lot in advance.
The text was updated successfully, but these errors were encountered: