Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Molecule.from_polymer_pdb throws warning and error when multiple polymers are present #2010

Open
mattwthompson opened this issue Feb 4, 2025 · 0 comments

Comments

@mattwthompson
Copy link
Member

Describe the bug

I can load a PDB file containing multiple proteins/polymers fine with Topology.from_pdb. When loading it with Moleucle.from_polymer_pdb, I'm first given a warning that something passed to from_rdkit has multiple components, which is supported but not supposed to be, and then later this bubbles up as an error. This is pretty confusing because

  • I didn't call from_rdkit
  • The reason for the error appears to be the same reason for the warning, except the warning implies to me that it's supported and won't become an error
  • The error message instructs me to use Molecule.from_pdb_and_smiles, which itself is deprecated

To Reproduce

In [1]: from openff.toolkit import Topology, Molecule
The OpenEye Toolkits are found to be installed but not licensed and therefore will not be used.
The OpenEye Toolkits require a (free for academics) license, see https://docs.eyesopen.com/toolkits/python/quickstart-python/license.html
DEPRECATION: --no-python-version-warning is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to remove the flag as it's a no-op. Discussion can be found at https://github.com/pypa/pip/issues/13154
The OpenEye Toolkits are found to be installed but not licensed and therefore will not be used.
The OpenEye Toolkits require a (free for academics) license, see https://docs.eyesopen.com/toolkits/python/quickstart-python/license.html

In [2]: Topology.from_pdb("openff/toolkit/data/proteins/TwoMol_SER_CYS.pdb")
Out[2]: <openff.toolkit.topology.topology.Topology at 0x130646840>

In [3]: Molecule.from_polymer_pdb("openff/toolkit/data/proteins/TwoMol_SER_CYS.pdb")
/Users/mattthompson/micromamba/envs/openff-toolkit-test/lib/python3.12/site-packages/openff/utilities/utilities.py:81: MoleculeDeprecationWarning: `Molecule.from_polymer_pdb` is deprecated in favor of `Topology.from_pdb`, the recommended method for loading PDB files. This method will be removed in a future release of the OpenFF Toolkit.
  return function(*args, **kwargs)
/Users/mattthompson/software/openff-toolkit/openff/toolkit/topology/molecule.py:4349: MultipleComponentsInMoleculeWarning: RDKit Molecule passed to from_rdkit consists of more than one molecule, consider running rdkit.Chem.AllChem.GetMolFrags(rdmol, asMols=True) or splitting input SMILES at '.' to get separate molecules and pass them to from_rdkit one at a time. While this is supported for legacy reasons, OpenFF Molecule objects are not supposed to contain disconnected chemical graphs and this may result in undefined behavior later on. The OpenFF ecosystem is built to handle multiple molecules, but they should be in a Topology object, ex: top = Topology.from_molecules([mol1, mol2])
  molecule = toolkit.from_rdkit(
---------------------------------------------------------------------------
MultipleMoleculesInPDBError               Traceback (most recent call last)
Cell In[3], line 1
----> 1 Molecule.from_polymer_pdb("openff/toolkit/data/proteins/TwoMol_SER_CYS.pdb")

File ~/micromamba/envs/openff-toolkit-test/lib/python3.12/site-packages/openff/utilities/utilities.py:81, in requires_package.<locals>.inner_decorator.<locals>.wrapper(*args, **kwargs)
     78 except Exception as e:
     79     raise e
---> 81 return function(*args, **kwargs)

File ~/software/openff-toolkit/openff/toolkit/topology/molecule.py:4052, in FrozenMolecule.from_polymer_pdb(cls, file_path, toolkit_registry, name)
   4049 offmol.add_default_hierarchy_schemes()
   4051 if offmol._has_multiple_molecules():
-> 4052     raise MultipleMoleculesInPDBError(
   4053         "This PDB has multiple molecules. The OpenFF Toolkit requires "
   4054         + "that only one molecule is present in a PDB. Try splitting "
   4055         + "each molecule into its own PDB with another tool, and "
   4056         + "load any small molecules with Molecule.from_pdb_and_smiles."
   4057     )
   4059 offmol.name = name
   4061 return offmol

MultipleMoleculesInPDBError: This PDB has multiple molecules. The OpenFF Toolkit requires that only one molecule is present in a PDB. Try splitting each molecule into its own PDB with another tool, and load any small molecules with Molecule.from_pdb_and_smiles.

This defeats the purpose, but is possible and feels like it shouldn't be:

In [16]: Molecule.from_pdb_and_smiles("openff/toolkit/data/proteins/TwoMol_SER_CYS.pdb", smiles = '.'.join([molecule.to_smiles() for molecule in Topology.from_pdb("openff/toolkit/data/proteins/TwoMol_SER_CYS.pdb").molecules]))

Additional context

This also causes some bloat in the test output, since the test for the error runs into the warning first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant