Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss=nan #18

Open
05063112lcs opened this issue Apr 22, 2024 · 2 comments
Open

loss=nan #18

05063112lcs opened this issue Apr 22, 2024 · 2 comments

Comments

@05063112lcs
Copy link

Hello, thank you very much for your excellent work, but when I was about to reproduce your work recently, I got an error that showed "loss=nan", I was training on the A5000GPU, what is the reason for this?

@ryf1123
Copy link
Contributor

ryf1123 commented Apr 26, 2024

Hi, thank you for posting this issue. We did not observe this problem in our testing. I am not sure whether this is caused by some corrupted data samples. Can you first try to filter the loss with function torch.nan_to_num and see if it helps?

@05063112lcs
Copy link
Author

嗨,感谢您发布此问题。我们在测试中没有观察到这个问题。我不确定这是否是由某些损坏的数据样本引起的。你能先尝试用函数过滤损失,看看是否有帮助吗?torch.nan_to_num

Hello, thank you very much for your reply. I had a problem in the process of training, I didn't make any changes, in the first or second round of training there would be "loss=nan", I tried to adjust the learning rate, but it didn't work. I used an A5000 graphics card for reproduction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants