Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training data setup #5

Open
sacombs opened this issue Oct 27, 2022 · 2 comments
Open

training data setup #5

sacombs opened this issue Oct 27, 2022 · 2 comments

Comments

@sacombs
Copy link

sacombs commented Oct 27, 2022

I would like to provide my own datasest for retraining torisional-diffusion. There are some things that I do not know what value to put in for the pickle file. For example, the conformers dictionary has the following:

{'geom_id': 123368967, 'set': 1, 'degeneracy': 3, 'totalenergy': -23.59133734, 'relativeenergy': 0.0, 'boltzmannweight': 0.8585, 'conformerweights': [0.28617, 0.28617, 0.28616], 'rd_mol': <rdkit.Chem.rdchem.Mol at 0x7f7b42014bd0>}

What should I put for boltzmannweight and degeneracy? Is there a setup script to take molfiles and convert them into the dataset for training?

@MatthewMasters
Copy link

Degeneracy is not used in the code so it's safe to exclude. The boltzmann weight can be calculated if you know the energy and temperature since w = exp(-E/kbT) where E=energy, T=temperature, and kb=boltzmann constant.

@gcorso
Copy link
Owner

gcorso commented Nov 16, 2022

Sorry for the delay and thank you very much @MatthewMasters for the answer!
All Matthew said is correct, moreover, if you don't use the Boltzmann weighted sampling (this is the way the ML community trains and evaluates these methods) you only need to have the rd_mol!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants