helper script for upgrading from 0.10 to 0.11 (ten2eleven.py) #377

orbeckst · 2015-08-04T23:43:51Z

Background

The 0.11.0 release will break the API in many places (see also the Release Notes for 0.11.0):

Issue	API Changes
#250 & #252	Summary of Timestep API changes
#310	frame numbers start at 0 instead of 1
#308	time is now a property of `Timestep` and not of the trajectory reader
#287	Summary of reorganization of modules (`MDAnalysis.lib`)
#372	Many `AtomGroup` methods were changed to properties (and some were renamed)
#376	`AtomGroup.numberOfAtoms()` (and similar methods for residues and segments) is now a property `AtomGroup.n_atoms`; likewise, `Universe.trajectory.numatoms` and `.numframes` are now `.n_atoms` and `.n_frames`, respectively.
#389	`AtomGroup.selectAtoms()` to `AtomGroup.select_atoms()` (and same for `Universe.selectAtoms()`
#373	calculating of bond, angle and dihedral values from 2,3, or 4-member `AtomGroup`s will change
#385	`AtomGroup.resids` (and `segids`) will return an array the size of the `AtomGroup`; the old behavior will be obtained with `AtomGroup.residues.resids`; likewise `ResidueGroup.segids` will return an array the size of the `ResidueGroup`
#406	`AtomGroup.set_<property>` are now pluralized to match the property names, where they were once singular. e.g. `ag.set_resnum` is now `ag.set_resnums`, `ag.set_radius` is now `ag.set_radii`.

ten2eleven.py script

In order to ease the burden on users we would like to have a script ten2eleven.py that aids the user in the translation of Python code to the new syntax. That might not always be possible but some (tedious) changes could likely be automated.

This should be somewhat similar to the py2to3 script.

Specifications

The script should take a python source file as input and write out an altered version of the file.
- Optionally, it should run recursively on a directory and change all py files in the directory.
- Optionally, it should write all changed files to a new target directory and replicate the original directory structure in the target directory.
Automatically do substitutions and renamings like the following:
```
ag.residues() → ag.residues
atom.number → atom.index
import MDAnalysis.core.qcprot → import MDAnalysis.lib.qcprot
```
(We need a complete list of substitutions – please add to this issue report!)
It would also be desirable to flag pieces of code that should be manually inspected, e.g. anything related to ts.frame. Optionally, it should insert comment lines in places that should be inspected.
It should keep a backup file of the original file and it should never overwrite the backup files.

The text was updated successfully, but these errors were encountered:

orbeckst · 2015-08-05T00:19:32Z

(Btw, looking at the code for lib2to3 I should add that my statement "ten2eleven.py should be similar to py2to3" needs to be taken with a grain of salt... 2to3 contains a lot of voodoo, including parsing of the AST to decide where to fix. But maybe one could re-used lib2to3 for our purposes...)

orbeckst · 2015-08-08T01:22:09Z

Instead of lib2to3 we could also look into rope, a python-based refactoring tool.

orbeckst · 2015-08-08T01:41:07Z

And more on using lib2to3:
Stack Overflow answer on lib2to3 Architecture Documentation points to Lennart Regebro's Porting to Python 3 book, in particular the chapter Extending 2to3 with your own fixers.

orbeckst · 2015-08-09T07:42:48Z

Add #389 (selectAtoms() to select_atoms()) to table.

richardjgowers · 2015-08-09T08:22:14Z

So the camelCase/single_underscore is a nicety, but it's not going to break any scripts, and will issue a nice message.

I think it's a lot more realistic for the script to read a script then give a line of line numbers and possible changes that need making, so something like:

ten2eleven.py my_awesome_analysis.py

line 10: 'print atom.number' - 'atom.number' has changed to 'atom.index'
line 11: 'idx = atomgroup.indices()' - 'atomgroup.indices is now a property not a method'

But if someone can make it replace things in place then that's better. Either way we'll need a list of replacements.

orbeckst · 2015-08-10T18:20:24Z

We definitely need the list of changes; I hope that @tylerjereddy 's page MDAnalysis 0.11.0 unifying release user guide will have this list.

I think we can write fixes for lib2to3 that do some of the replacements in a smarter way than sed. There are also recipes for rewriting import statements. However, this week I won't have the time to look at this in more detail but Extending 2to3 with your own fixers looked promising.

(rope seems to be designed to do replacements but as part of an interactive frame work, e.g. hooked into vim or emacs. The problem that I saw was that you need to know the offset in the file of the term that you want to refactor, i.e. you put the cursor mark there and then there's a function that guesses the word, passes it through the static code analyzer, and eventually makes the replacement.)

tylerjereddy · 2015-08-12T12:58:08Z

Let me take a look. If my progress is too slow we can always re-assign to someone else.

orbeckst · 2015-08-12T17:07:29Z

Sure, thanks.

I would definitely try to use lib2to3. The docs on writing fixes seem a decent starting point. (And it actually sounds like a fun exercise to me.)

Oliver Beckstein
email: orbeckst@gmail.com
skype: orbeckst

Am Aug 12, 2015 um 5:58 schrieb Tyler Reddy notifications@github.com:

Let me take a look. If my progress is too slow we can always re-assign to someone else.

—
Reply to this email directly or view it on GitHub.

tylerjereddy · 2015-08-13T18:03:39Z

I've produced a crude fixer (https://github.com/tylerjereddy/ten2eleven) that seems to work on a simple test conversion for selectAtoms to select_atoms. Currently it seems to be necessary to copy the fixer to lib2to3/fixes/ and then you can generate the diff with:

2to3 -f selectatoms test_dummy_old_MDA_code.py

For the modules that now have different import paths, the fixers will be more complicated than that one! The example page Oli linked shows one way to deal with changing import statements (have to deal with from and various import styles, which can get nasty). Writing tests will be important to ensure we have the correct behaviour, and it can be tricky to think of the corner cases.

orbeckst · 2015-08-13T21:44:01Z

@tylerjereddy – progress! Do you see a way that we can make a self-contained package so that users can pip install our upgrade script? I am thinking of making it a separate repo and package. It would be nice if we can get away with just copying & modifying the original 2to3 script, using import lib2to3everywhere and just providing our fixers.

tylerjereddy · 2015-08-13T21:47:47Z

Not sure about that just yet. I've added a new fixer for converting atomgroup methods to properties. That one was much harder, but seems to work properly. The best bet beyond the reference you provided is just to hack patterns together based on the fixers already in the standard library. Presumably we can deal with the packaging retroactively after various fixers are drafted.

tylerjereddy · 2015-08-13T22:06:52Z

Likewise working for atomgroup count methods camelcase --> new properties.

orbeckst · 2015-08-13T23:17:39Z

I think I would try out the suggestion to use the find_pattern.py script, preferrably on the older unit tests: Perhaps a good test is to use 0.10.0 unit tests, run, check failures, convert, run tests and hopefully end up with zero failures.

tylerjereddy · 2015-08-14T19:01:04Z

Ok, it is now possible to run the current full suite of fixers on the command line like this:

python ten2eleven.py test_dummy_old_MDA_code.py

or a specific fixer:

python ten2eleven.py -f agcountmethods test_dummy_old_MDA_code.py

Now, as far as installing with pip and using on other systems, I may need a little help, but there are only three lines of code in ten2eleven.py, so I imagine those familiar with packaging details can handle that.

We also have a new fixer for the import statement migrations to MDAnalysis.lib, etc. It seems to work well for import and from x import, but yeah will need thorough testing beyond my crude dummy module, probably as you suggest with the unit tests eventually. I'm also thinking that if people do import MDAnalysis as MDA or mda or any arbitrary thing, that might be challenging to deal with downstream calls to the migrated modules. Still, I think this is moving in the right direction.

richardjgowers · 2015-08-20T13:30:16Z

Fixes required because of #373

# These methods are now properties returning an object
AtomGroup.bond() -> AtomGroup.bond.value()
AtomGroup.angle() -> AtomGroup.angle.value()
AtomGroup.torsion() -> AtomGroup.dihedral.value()
AtomGroup.improper() -> AtomGroup.improper.value()

# Torsions now called dihedrals throughout package
# (class names)
Torsion -> Dihedral
Improper_Torsion -> ImproperDihedral

AtomGroup.torsions -> AtomGroup.dihedrals
Atom.torsions -> Atom.dihedrals
Universe.torsions -> Universe.dihedrals

# From MDAnalysis.lib.distances
calc_torsions() -> calc_dihedrals()

tylerjereddy · 2015-08-20T20:09:42Z

I've added two new fixers to deal with some of those. More to come, hopefully, as time permits.

richardjgowers · 2015-08-22T10:27:28Z

These are the camelCase changes. Should just be simple find and replaces

# from core.AtomGroup
totalMass -> total_mass
totalCharge -> total_charge
centerOfGeometry -> center_of_geometry
radiusOfGyration -> radius_of_gyration
shapeParameter -> shape_parameter
momentOfInertia -> moment_of_inertia
principalAxes -> principal_axes
packIntoBox -> pack_into_box
asUniverse -> as_universe

# from lib.distances
applyPBC -> apply_PBC

EDIT 2015-8-30 (@orbeckst): fixed radiusOfGyration

tylerjereddy · 2015-08-22T20:03:15Z

Ok, I wrote a new fixer to cover those.

tylerjereddy · 2015-08-28T16:48:11Z

So, I've started testing ten2eleven.py conversion of 0.10 unit tests to 0.11 friendly code after making several more fixers. The basic procedure:

need to checkout old version (0.10) of testsuite/MDAnalysisTests: git checkout 06c93bc0f6e66423 -- testsuite/MDAnalysisTests
- have to change __version__ in __init__.py to allow running tests with different version of library
ensure the above tests fail with 0.11-dev MDA code
- I get: FAILED (SKIP=3, errors=189, failures=118)
try converting 0.10 unit test code to 0.11 unit test code with ten2eleven.py: python ten2eleven.py -W *py
re-run the converted unit tests:
- I get: FAILED (SKIP=3, errors=73, failures=144)

The substantial reduction in the number of errors relative to a modest increase in the number of failures is probably a decent sign for the first attempt. I see that there are a lot of issues with failures involving frame numbers and trajectory slicing, which is not surprising given the fact that the module doesn't handle this particular issue at the moment.

Consider for example this test:

self.trajectory.rewind()
assert_equal(self.ts.frame, 1, "rewinding to frame 1")

One can easily imagine various similar cases where a user or a unit test would have any arbitrary frame number based assumption in their code. I'm skeptical that we could account for all of these cases, though I'd be happy to be proven wrong! Adjusting all instances of trajectory[index] to trajectory[index-1] would be risky I think (i.e., trajectory[0] is the first frame in both 0.10 and 0.11, right? it is just that the frame number is different). We could try warning the user if the keywords trajectory or frame are in a line I suppose, or maybe could cook up some "smart defaults" with warnings. Thoughts?

richardjgowers · 2015-08-28T16:59:32Z

trajectory[index] didn't actually change, but rather the frame of the returned Timestep changed

# Previously
ts = u.trajectory[i]
ts.frame == i + 1

# Now
ts = u.trajectory[i]
ts.frame == i

So you might be able to look for .frame and issue warnings.

tylerjereddy · 2015-08-28T17:15:17Z

Yes, that is what I was arguing in my comment--difficult to do this kind of conversion automatically for any arbitrary usage of a frame number.

orbeckst · 2015-08-30T18:07:39Z

Hi @tylerjereddy ,

you're doing a terrific job here!

I agree with what was said before: Do the safe conversions and point out potential problems as much as possible. Perhaps even by inserting comments into the code along the lines of

# ten2eleven.py detected a possible incompatibility between this code and MDAnalysis >= 0.11.0
# NAME / DESCRIPTION / DATE
# Please manually review the following line (and remove these comments afterwards):
ts.frame == i + 1

I also opened a stub page on the wiki for migrating MDAnalysis code with ten2eleven.py. Can you please add a howto description there that tells users what to do and possibly what potential pitfalls are?

Cheers!

orbeckst · 2015-08-30T18:16:31Z

We also have to decide how to make ten2elevn.py available. I see two options

Add it to the MDAnalysis 0.11.0 distribution.
- install as an executable script
- advantage:
  - batteries included, user has everything
  - tight coupling between fixes and release
- disadvantage:
  - MDAnalysis never installed scripts before, so this is dirty
  - what about users that skipped 0.11.0? How do they get the script to eventually upgrade?
Make it a separate package (also on PyPi), which gets its own repository.
- installs itself as a script with zero dependencies
- advantages:
  - clean, does not pollute the library package itself
  - can be uninstalled after use
  - can be separately maintained and we can use the same framework for other upgrades if need be (it's so neat that I think we should....)
- disadvantages:
  - user needs additional package (but can be automated by specifying it as a dependency of 0.11.0)

I vastly favour (2) but perhaps other people have different opinions, in particular @richardjgowers as release manager will have a deciding say in this.

dotsdl · 2015-08-30T18:40:25Z

I favor 1. I think it's okay to continue providing the script as a convenience. Ultimately with an API break there will still be issues that require manual intervention, but the hope is that this eases the pain and codifies the changes that one needs to make.

richardjgowers · 2015-08-31T09:58:24Z

Is it possible to make it work as

import MDAnalysis as mda

mda.ten2eleven('myfile.py')  # spits out a file in same dir with fixes

If I've understood this correctly, this is option 1 lite.

dotsdl · 2015-08-31T20:03:32Z

@tylerjereddy I updated Oliver's table at the top with the changes to setters (singular to plural)

tylerjereddy · 2015-09-02T10:17:10Z

@dotsdl Ok, I've added a new fixer that handles the following:

ag.set_mass(new) --> ag.set_masses(new)
ag.set_charge(new) --> ag.set_charges(new)
ag.set_name(new) --> ag.set_names(new)
ag.set_type(new) --> ag.set_types(new)
ag.set_radius(new) --> ag.set_radii(new)
ag.set_bfactor(new) --> ag.set_bfactors(new)
ag.set_altloc(new) --> ag.set_altlocs(new)
ag.set_serial(new) --> ag.set_serials(new)
ag.set_resid(new) --> ag.set_resids(new)
ag.set_resname(new) --> ag.set_resnames(new)
ag.set_resnum(new) --> ag.set_resnums(new)
ag.set_segid(new) --> ag.set_segids(new)

I see that you've added a sample conversion to the unifying release user guide, so that's good.

tylerjereddy · 2015-09-02T12:58:45Z

I'm still investigating the other things discussed here. At the moment I'm focused on the fixer for the ts.frame numbering issue. I don't know what the status will be by the intended release date (Friday?), but I'll do what I can.

richardjgowers · 2015-09-02T18:52:13Z

With the .frame fix, because we're only issuing a warning, it could just be a quick search for the string ".frame"?

Option for Issue #377 Allows usage as: ``` import MDAnalysis as mda mda.ten2eleven('myscript.py') ```

richardjgowers · 2015-09-03T14:21:26Z

I pushed a branch that takes @tylerjereddy 's tool and includes it in MDAnalysis. It then allows usage as:

import MDAnalysis as mda

mda.ten2eleven('script.py')

It's kind of @orbeckst 's option 1, except it isn't intrusively installing a script, and it will be present in the library for as long as we need it. If we put another change in 0.12.0, we'd just add another fixer and "ten2eleven" would actually work there too. Maybe "old2new" / "makemyscriptworkagainplease" or something might be a more future proof name.

kain88-de · 2015-09-04T13:26:31Z

Maybe "old2new" / "makemyscriptworkagainplease" or something might be a more future proof name.

Sure about this? For me such a name would imply that I can use this kind of script for any future version upgrade (0.12 -> 0.13, 0.10 -> 1.00, ... ). Then the converting script needs to be updated and maintained for a long time for all different update combinations. I would figure that this is a huge amount of extra work that could be spend better otherwise. For a library like MDAnalysis deprecating old API's for 1-2 releases and then removing them sounds like a better approach in the long run.

Offering a conversion script once for a huge (meaning a log of) API break is ok imho but as far as I understood the next versions of MDA are not planned to have huge API breaks.

richardjgowers · 2015-09-04T13:58:00Z

I think we can make it work for future changes (if any). I don't think different combinations will be a problem, things we change aren't getting changed again.

It would be more annoying if we set this in stone and had to make a new script for 0.12 -> 0.13 (for example). I think the big features for 1.0 is vaguely looking like:

new AtomGroup Move from Atom based topology to array based #363 (with np arrays, should be mostly internal)
tidy up selections Selection parsing overhaul. New syntax features. #371 (maybe new keywords here, so maybe a couple fixes)
python 3 (shouldn't change any API)

tylerjereddy · 2015-09-04T15:09:09Z

I'm still working to deal with .frame issues. It is amazingly complicated. For example, inserting the comment before an indented line containing .frame abrogates the indentation information in the prefix and dedents the affected line. I'll have to see if I can dig the indentation information out of the nodes and still get the comment lines in gracefully.

@richardjgowers: is there a way we can eventually merge my repo in so I get a few commit credits to the dev branch (I rarely get to commit code changes these days it seems!). Obviously not that important as I've got my name on the fixer docstrings.

richardjgowers · 2015-09-04T15:26:57Z

I think I've got it working here: https://github.com/MDAnalysis/mdanalysis/commits/feature-merged-ten2eleven

~~If you've got nothing else locally, then you can merge that into develop here and work on this repo, otherwise I can redo this~~

(I used this: MDA was "A" ten2eleven was "B"

http://blog.caplin.com/2013/09/18/merging-two-git-repositories/
)

Edit:
Oh wait, I still need to delete the "old" versions of the files, but this should get the commits in anyway.

orbeckst · 2015-09-04T16:59:03Z

If we are going down the module road then perhaps make a sub package migration:

from MDAnalysis.migration import ten2eleven

At least keeps the name space cleaner.

Oliver Beckstein
email: orbeckst@gmail.com
skype: orbeckst

Am Sep 3, 2015 um 7:21 schrieb Richard Gowers notifications@github.com:

import MDAnalysis as mda

mda.ten2eleven('script.py')
It's kind of @orbeckst 's option 1, except it isn't intrusively installing a script, and it will be present in the library for as long as we need it. If we put another change in 0.12.0, we'd just add another fixer and "ten2eleven" would actually work there too. Maybe "old2new" / "makemyscriptworkagainplease" or something might be a more future proof name.

tylerjereddy · 2015-09-04T17:07:38Z

Ok, I assume I can continue working on that feature branch (I've already pushed to it).

I don't think I'm going to solve the .frame issue today. This is an example problem case:

#frame warning with indentation complexity:                                                      
class Dummy(object):                                                                    
    assert_almost_equal(ts.frame, 544)

My fixers just notice .frame outside of the class or function they might be defined in [although list comprehensions and a few other things are recognized], so they end up nuking the indentation information. The reason this isn't a problem with my other fixers is that we aren't adding comments, so we don't mess with non-code components of the parse tree for the most part. I only noticed this because it causes major issues with conversion of unit tests (plenty of classes with .frame usage).

You can see some brief discussion of this type of issue in the standard library fix_metaclass.py fixer, which deals with class Dummy: pass differently than classes with code blocks as the former doesn't have a suite with indentation information in the parse tree.

richardjgowers · 2015-09-05T09:30:53Z

I'm happy to leave .frame as a known issue. It's not going to break any scripts

…ogress bar)

orbeckst added help wanted Priority-High usability Component-Scripts maintainability API labels Aug 4, 2015

orbeckst added this to the 0.11 milestone Aug 4, 2015

orbeckst mentioned this issue Aug 6, 2015

AtomGroup.resids should return an array of equal size to AtomGroup #385

Closed

tylerjereddy self-assigned this Aug 12, 2015

richardjgowers mentioned this issue Aug 22, 2015

Fixes Issue #389 #399

Merged

richardjgowers mentioned this issue Aug 31, 2015

Fixes issue #385 #406

Merged

richardjgowers added a commit that referenced this issue Sep 3, 2015

Trying inbuilt ten2eleven

ae4a00e

Option for Issue #377 Allows usage as: ``` import MDAnalysis as mda mda.ten2eleven('myscript.py') ```

richardjgowers mentioned this issue Sep 5, 2015

Feature merged ten2eleven #413

Merged

orbeckst closed this as completed in d37794f Sep 5, 2015

lohani2280 pushed a commit to lohani2280/mdanalysis that referenced this issue Feb 17, 2017

link to MDAnalysis#377 and updated contact_matrix example (no more pr…

f415990

…ogress bar)

helper script for upgrading from 0.10 to 0.11 (ten2eleven.py) #377

helper script for upgrading from 0.10 to 0.11 (ten2eleven.py) #377

Comments

orbeckst commented Aug 4, 2015

Background

ten2eleven.py script

Specifications

orbeckst commented Aug 5, 2015

orbeckst commented Aug 8, 2015

orbeckst commented Aug 8, 2015

orbeckst commented Aug 9, 2015

richardjgowers commented Aug 9, 2015

orbeckst commented Aug 10, 2015

tylerjereddy commented Aug 12, 2015

orbeckst commented Aug 12, 2015

tylerjereddy commented Aug 13, 2015

orbeckst commented Aug 13, 2015

tylerjereddy commented Aug 13, 2015

tylerjereddy commented Aug 13, 2015

orbeckst commented Aug 13, 2015

tylerjereddy commented Aug 14, 2015

richardjgowers commented Aug 20, 2015

tylerjereddy commented Aug 20, 2015

richardjgowers commented Aug 22, 2015

tylerjereddy commented Aug 22, 2015

tylerjereddy commented Aug 28, 2015

richardjgowers commented Aug 28, 2015

tylerjereddy commented Aug 28, 2015

orbeckst commented Aug 30, 2015

orbeckst commented Aug 30, 2015

dotsdl commented Aug 30, 2015

richardjgowers commented Aug 31, 2015

dotsdl commented Aug 31, 2015

tylerjereddy commented Sep 2, 2015

tylerjereddy commented Sep 2, 2015

richardjgowers commented Sep 2, 2015

richardjgowers commented Sep 3, 2015

kain88-de commented Sep 4, 2015

richardjgowers commented Sep 4, 2015

tylerjereddy commented Sep 4, 2015

richardjgowers commented Sep 4, 2015

orbeckst commented Sep 4, 2015

tylerjereddy commented Sep 4, 2015

richardjgowers commented Sep 5, 2015