-
Notifications
You must be signed in to change notification settings - Fork 670
Topology attribute handling
As per the discussion started in #942, missing attributes from a topology will be handled according to different categories (a few more categories were added in this page). These are:
- Left out of the Topology, with a general
AttributeError
on accession; - Left out of the Topology, but with a specific
MDAnalysis.MissingTopologyAttributeError
on accession, perhaps suggesting how to set/guess values; - Added to the topology, set to a default;
- Added to the topology, guessed — possibly with a warning that it is guessed. If guess fails, assign some default;
- Added to the topology, guessed — possibly with a warning that it is guessed. If guess fails, warn and leave unassigned (like category 2 above);
- Absolutely necessary; abort Topology/Universe loading if absent;
- Internal: attribute generated internally that does not depend on topology availability or guessing;
- Attribute that you think should be removed from consideration.
(Thanks to @jbarnoud for suggesting some of the possibilities above)
We need to decide which attributes go into which category. This will be implemented as part of the topology refactor (#363). Please edit the table below to vote your opinion, and if needed add your justification to #942. Factors you might want to take into account are
- consistency,
- ease for the user in case of missing attributes,
- little boilerplate for the most common cases,
- and backwards compatibility (though this last factor might be of less importance since compatibility will already be broken by the topology refactor).
Attribute | @mnmelo | @orbeckst | @richardjgowers |
---|---|---|---|
residues* | 3 | 3 | 3 |
segments* | 3 | 3 | 3 |
indices | 7 | 7 | 7 |
resindices | 7 | 7 | 7 |
segindices | 7 | 7 | 7 |
ids | 3 | 3 | 1 |
names | 6 | 2 | 2 |
types | 1 | 2 | 2 |
elements | 5 | 5 | 2 |
radii | 5 | 2 | 2 |
chainIDs | 1 | 1 | 1 |
icodes | 1 | 1 | 1 |
tempfactors‡ | 1 | 1 | 1 |
masses | 5 | 5 | 5 |
charges | 2 | 2 | 2 |
bfactors‡ | 1 | 1 | 1 |
occupancies | 1 | 1 | 1 |
altLocs | 1 | 1 | 1 |
resids | 3 | 3 | 3 |
resnames | 2 | 2 | 2 |
resnums | 3 | 3 | 2 |
segids | 3 | 3 | 3 |
bonds | 5 | 5 | 2 |
angles | 5 | 5 | 2 |
dihedrals | 5 | 5 | 2 |
impropers | 1 | 1 | 2 |
other attrs | 1 | 1 | 1 |
* While not strictly attributes, Residue/Segment assignment falls pretty much in the same categories, and their handling is relevant for related attributes (resindices, resids, etc.).
‡ tempfactors and bfactors are the same thing, we only need one.
- I would also like a specific error message (category 2) for anything that has a keyword in the search syntax, such as type.
- I am wavering on radii and I am not sure if they should really be guessed (cat 5). We could assign Bondi van der Waals radii [A. Bondi. van der waals volumes and radii. The Journal of Physical Chemistry, 68(3):441–451, 1964. doi: 10.1021/j100785a001. URL http://pubs.acs.org/doi/abs/10.1021/j100785a001.] and some ionic radii (or perhaps Born radii) for ions but ultimately I am not sure how useful this is going to be. What use will people have of such ad-hoc radii? The main use for radii is as when reading/writing PQR files, where the radii have well-defined meaning. So maybe cat 2 would be better for radii.
- It would be nice of the "PDB" attributes (chainID, iCodes, occupancy, bfactor (which is the same as tempfactor so we should only have one and I vote for bfactor because of continuity) could be elevated to cat 2 because they are pretty common and we might reduce questions regarding "Why does this not work" if the error message just says right away "Load from a PDB file". I left them at 1, though, in order to keep things simple.
Bond guessing is currently not optimised for large systems, so guessing this by default is a bad idea for larger systems. Maybe we could implement a size cutoff for auto-guessing (~10k atoms or whatever keeps Universe load times below ~1s?)
I'm not sure how an Atom id is different from an Atom index? I understand that some systems may number irregularly, but this isn't necessary everywhere.
Maybe it's just my background, but I'd like things to work with coarse-grained systems just as well, so things like element, resnum and other atomistic based things aren't mandatory there.