Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions #104

Open
omidshy opened this issue Dec 24, 2024 · 0 comments
Open

Some questions #104

omidshy opened this issue Dec 24, 2024 · 0 comments

Comments

@omidshy
Copy link

omidshy commented Dec 24, 2024

Hi John,
Great work! I appreciate your effort in putting this together. I have some questions and a few issues based on my experience with the graph-pes code over the past couple of days.

Energy offset

When I try to learn the energy offsets by setting +LearnableOffset() in the config file, after a number of epochs, the offset values suddenly drop to zero, and I get NaN in the log for all the validation metrics from that point onward. (I'm not sure, maybe gradient clipping could help with this).

PS: Could you briefly explain how energy offsets are used in graph-pes?

[Update]: It also happens with fixed energy offsets, but not when I don't use offsets.

Early stopping

As I expect and as noted here:

early_stopping_patience: Union[int, None]
"""
The number of epochs to wait for improvement in the total validation loss
before stopping training. Set to ``None`` to disable early stopping.
"""

the early_stopping_patience value sets the number of epochs to wait for improvement in the validation loss before stopping. However, it actually uses the early_stopping_patience number of validation checks as a threshold for early stopping.

checks_since_best = (
self.state["total_checks"] - self.state["best_val_loss_check"]
)

Am I wrong or missing something, or is this how it was intended?

Weight scheduler

I couldn't find any option to dynamically change the value of the weights for different terms in the loss during training. Based on my experience, it usually helps to balance the energy and force terms toward the end of the training.

Multi-GPU training

When training on 2 GPUs, the second device appears fully idle in W&B logging. Is this a W&B issue? The second GPU doesn't appear to be actually un-utilized.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant