Some questions #104

omidshy · 2024-12-24T10:32:51Z

Hi John,
Great work! I appreciate your effort in putting this together. I have some questions and a few issues based on my experience with the graph-pes code over the past couple of days.

Energy offset

When I try to learn the energy offsets by setting +LearnableOffset() in the config file, after a number of epochs, the offset values suddenly drop to zero, and I get NaN in the log for all the validation metrics from that point onward. (I'm not sure, maybe gradient clipping could help with this).

PS: Could you briefly explain how energy offsets are used in graph-pes?

[Update]: It also happens with fixed energy offsets, but not when I don't use offsets.

Early stopping

As I expect and as noted here:

graph-pes/src/graph_pes/config/training.py

Lines 39 to 43 in f1003ae

    
               early_stopping_patience: Union[int, None] 
        
               """ 
        
               The number of epochs to wait for improvement in the total validation loss 
        
               before stopping training. Set to ``None`` to disable early stopping. 
        
               """

the early_stopping_patience value sets the number of epochs to wait for improvement in the validation loss before stopping. However, it actually uses the early_stopping_patience number of validation checks as a threshold for early stopping.

graph-pes/src/graph_pes/training/callbacks.py

Lines 194 to 196 in f1003ae

    
           checks_since_best = ( 
        
               self.state["total_checks"] - self.state["best_val_loss_check"] 
        
           )

Am I wrong or missing something, or is this how it was intended?

Weight scheduler

I couldn't find any option to dynamically change the value of the weights for different terms in the loss during training. Based on my experience, it usually helps to balance the energy and force terms toward the end of the training.

Multi-GPU training

When training on 2 GPUs, the second device appears fully idle in W&B logging. Is this a W&B issue? The second GPU doesn't appear to be actually un-utilized.

Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions #104

Some questions #104

omidshy commented Dec 24, 2024 •

edited

Loading

Some questions #104

Some questions #104

Comments

omidshy commented Dec 24, 2024 • edited Loading

Energy offset

Early stopping

Weight scheduler

Multi-GPU training

omidshy commented Dec 24, 2024 •

edited

Loading