-
Notifications
You must be signed in to change notification settings - Fork 51
alchemical analysis features
alchemical-analysis announced in late 2017 that its features will be moved to alchemlyb. This is going to be a long and tedious process but it will have the following advantages
- tested (all code coming into alchemlyb is tested to > 90% coverage, with 95% as a goal)
- modular (functionality as library functions)
- Python 3 (and Python 2)
The alchemlyb team welcomes user input: please raise an issue in the Issue Tracker for any alchemical-analysis features that you would really like to have in alchemlyb.
Please also feel free to edit this wiki page and contribute to the discussion.
Please describe the feature and make a case for why you want it included. Add your name/GitHub handle; feel free to add yourself to any existing entries, too. Popular features are more likely to be migrated. (See issue #54 to discuss the process.)
alchemical-analysis offers information about the statistical inefficiencies of the input datasets - it would be nice to have this information accessible also when using the alchemlyb implementation
In alchemical-analysis it is possible to specify a threshold for the number of samples to keep in the uncorrelation process - this is currently not possible in alchemlyb
In principle these methods determine the series that is used in the statistical_inefficiency() function. They should take the datasets as argument and return a series that can be used to do the autocorrelation analysis. It is much easier to reproduce alchemical-analysis calculations when these methods are implemented.
@orbeckst
-m METHODS, --methods=METHODS
- missing estimators
- Are some of these estimators more important than others? [See @mrshirts remarks below; DLM agrees.]
@mrshirts
- BAR is high priority, because sometimes MBAR can't be done if we don't have energies at all i+1's. The BAR solution can be made very fast (significantly faster than MBAR) (as it is is in the pyMBAR package).
- DEXP and IEXP are single state perturbation. Worth including for comparison. There are two because there are essentially two ways to calculate if you have a series of lambda points.
- TI-CUBIC is essentially a higher order integration of the <dH/dl> using cubic splines. Experience (non-exhaustive) has shown that it's not really much better than TI and has a larger chance of failing because of locally high curvature. I think this is lower priority, especially since it's a pain to handle the uncertainties correctly in the code. There could easily be better integration formulas. IF equally spaced, one could to simpsoms, or romberg, but there doesn't appear to be a general integration algorithm that works well for predefined spacing (as opposed to adaptive spacing). So could be cut. [DLM: Agree; propose cutting.]
- GINS and GDEL are the Gaussian approximations to insertion and deletion FEP. We included them because people kept saying that the Gaussian versions worked, and they really only work for linear problems (charging, etc), and we had to have a testbed to show them. Low overhead to put in. [DLM: So low priority (not much value) but worth including.]
- UBAR is BAR without optimizing the constant. The only reason one would ever do this is because you don't want to maintain a history to adaptively update everything each iteration, which would only happen if you were running this adaptively, i.e. maintaining the accumulated averages (O(1) operation) each step, so you have a cheap estimate each step without running a nonlinear optimization. BUT not very accurate in most cases. [DLM: Thus I propose we not do this unless someone needs it for something.]
- RBAR is interesting, since you calculate the UBAR for a series of 'trial' free energies, and choose the one that best satisfies the equations. One can get a very accurate answer with no iteration each step if you know the range to start out with. PROBABLY not worth supporting, since one is not going to be using alchemlyb adaptively, in the sense that you would need to keep K sets of averages around in between alchemlyb runs. If one were implemented a code where it was tightly integrated, it could be very useful, but likely not in postanalysis code. [DLM: Propose skipping.]
@orbeckst
-w, --overlap Print out and plot the overlap matrix.
- unique functionality, quite useful in visual analysis of the data quality
@mrshirts: yes, very useful. [@davidlmobley: agree] Very easy to implement once MBAR has been called, requires MBAR to be called first. How would that dependency be enforced? Try a call to see if the object exists, generate if it doesn't? [@davidlmobley: Implementation detail; can deal with separately.]
@davidlmobley
- Graphics: Visualization of TI/BAR free energy estimates (as a function of lambda and as a function of time); convergence graphs; visualization of overlap matrix (e.g. DOI 10.1007/s10822-015-9840-9)
- Graphical visualization for comparison of forward and reverse estimates of free energies
- Breakout of individual components of free energy (if not already available; not clear to me without trying)
- Graphical cross-comparison of analysis techniques when multiple techniques are applied, e.g. DOI 10.1007/s10822-015-9840-9 figure 4.
- Easy consistency checking across techniques (one of the very valuable things about running multiple analysis techniques is that they often agree, except when there is a problem)
The following features already exist
- MBAR and TI estimators
- subsampling (with preprocessing.subsampling.statistical_inefficiency() (Does this correspond to the
-n UNCORR, --uncorr=UNCORR
feature??) - discarding of initial time (
-s EQUILTIME, --skiptime=EQUILTIME
) and more flexible slicing with preprocessing.subsampling.slicing() - Extract the energy data from the backward direction (
-e, --backward
) can be done with preprocessing.subsampling.slicing() (... I think ... check!) [@davidlmobley: We would want to make sure it's obvious how to do this, and how to graphically visualize.]
The following features only exist in alchemlyb
- equilibrium detection with preprocessing.subsampling.equilibrium_detection()
@mrshirts:
- Estimation of uncertainties and covariances by bootstrapping. Very useful to diagnose if things go wrong in the error estimates, generally more reliable error estimates in regime of low sampling.