You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi John,
Great work! I appreciate your effort in putting this together. I have some questions and a few issues based on my experience with the graph-pes code over the past couple of days.
Energy offset
When I try to learn the energy offsets by setting +LearnableOffset() in the config file, after a number of epochs, the offset values suddenly drop to zero, and I get NaN in the log for all the validation metrics from that point onward. (I'm not sure, maybe gradient clipping could help with this).
PS: Could you briefly explain how energy offsets are used in graph-pes?
[Update]: It also happens with fixed energy offsets, but not when I don't use offsets.
The number of epochs to wait for improvement in the total validation loss
before stopping training. Set to ``None`` to disable early stopping.
"""
the early_stopping_patience value sets the number of epochs to wait for improvement in the validation loss before stopping. However, it actually uses the early_stopping_patience number of validation checks as a threshold for early stopping.
Am I wrong or missing something, or is this how it was intended?
Weight scheduler
I couldn't find any option to dynamically change the value of the weights for different terms in the loss during training. Based on my experience, it usually helps to balance the energy and force terms toward the end of the training.
Multi-GPU training
When training on 2 GPUs, the second device appears fully idle in W&B logging. Is this a W&B issue? The second GPU doesn't appear to be actually un-utilized.
Thanks!
The text was updated successfully, but these errors were encountered:
Hi John,
Great work! I appreciate your effort in putting this together. I have some questions and a few issues based on my experience with the
graph-pes
code over the past couple of days.Energy offset
When I try to learn the energy offsets by setting
+LearnableOffset()
in the config file, after a number of epochs, the offset values suddenly drop to zero, and I getNaN
in the log for all the validation metrics from that point onward. (I'm not sure, maybe gradient clipping could help with this).PS: Could you briefly explain how energy offsets are used in
graph-pes
?[Update]: It also happens with fixed energy offsets, but not when I don't use offsets.
Early stopping
As I expect and as noted here:
graph-pes/src/graph_pes/config/training.py
Lines 39 to 43 in f1003ae
the
early_stopping_patience
value sets the number of epochs to wait for improvement in the validation loss before stopping. However, it actually uses theearly_stopping_patience
number of validation checks as a threshold for early stopping.graph-pes/src/graph_pes/training/callbacks.py
Lines 194 to 196 in f1003ae
Am I wrong or missing something, or is this how it was intended?
Weight scheduler
I couldn't find any option to dynamically change the value of the weights for different terms in the loss during training. Based on my experience, it usually helps to balance the energy and force terms toward the end of the training.
Multi-GPU training
When training on 2 GPUs, the second device appears fully idle in W&B logging. Is this a W&B issue? The second GPU doesn't appear to be actually un-utilized.
Thanks!
The text was updated successfully, but these errors were encountered: