Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helper script for upgrading from 0.10 to 0.11 (ten2eleven.py) #377

Closed
orbeckst opened this issue Aug 4, 2015 · 37 comments
Closed

helper script for upgrading from 0.10 to 0.11 (ten2eleven.py) #377

orbeckst opened this issue Aug 4, 2015 · 37 comments

Comments

@orbeckst
Copy link
Member

orbeckst commented Aug 4, 2015

Background

The 0.11.0 release will break the API in many places (see also the Release Notes for 0.11.0):

Issue API Changes
#250 & #252 Summary of Timestep API changes
#310 frame numbers start at 0 instead of 1
#308 time is now a property of Timestep and not of the trajectory reader
#287 Summary of reorganization of modules (MDAnalysis.lib)
#372 Many AtomGroup methods were changed to properties (and some were renamed)
#376 AtomGroup.numberOfAtoms() (and similar methods for residues and segments) is now a property AtomGroup.n_atoms; likewise, Universe.trajectory.numatoms and .numframes are now .n_atoms and .n_frames, respectively.
#389 AtomGroup.selectAtoms() to AtomGroup.select_atoms() (and same for Universe.selectAtoms()
#373 calculating of bond, angle and dihedral values from 2,3, or 4-member AtomGroups will change
#385 AtomGroup.resids (and segids) will return an array the size of the AtomGroup; the old behavior will be obtained with AtomGroup.residues.resids; likewise ResidueGroup.segids will return an array the size of the ResidueGroup
#406 AtomGroup.set_<property> are now pluralized to match the property names, where they were once singular. e.g. ag.set_resnum is now ag.set_resnums, ag.set_radius is now ag.set_radii.

ten2eleven.py script

In order to ease the burden on users we would like to have a script ten2eleven.py that aids the user in the translation of Python code to the new syntax. That might not always be possible but some (tedious) changes could likely be automated.

This should be somewhat similar to the py2to3 script.

Specifications

  • The script should take a python source file as input and write out an altered version of the file.

    • Optionally, it should run recursively on a directory and change all py files in the directory.
    • Optionally, it should write all changed files to a new target directory and replicate the original directory structure in the target directory.
  • Automatically do substitutions and renamings like the following:

    ag.residues() → ag.residues
    atom.number → atom.index
    import MDAnalysis.core.qcprot → import MDAnalysis.lib.qcprot
    

    (We need a complete list of substitutions – please add to this issue report!)

  • It would also be desirable to flag pieces of code that should be manually inspected, e.g. anything related to ts.frame. Optionally, it should insert comment lines in places that should be inspected.

  • It should keep a backup file of the original file and it should never overwrite the backup files.

@orbeckst
Copy link
Member Author

orbeckst commented Aug 5, 2015

(Btw, looking at the code for lib2to3 I should add that my statement "ten2eleven.py should be similar to py2to3" needs to be taken with a grain of salt... 2to3 contains a lot of voodoo, including parsing of the AST to decide where to fix. But maybe one could re-used lib2to3 for our purposes...)

@orbeckst
Copy link
Member Author

orbeckst commented Aug 8, 2015

Instead of lib2to3 we could also look into rope, a python-based refactoring tool.

@orbeckst
Copy link
Member Author

orbeckst commented Aug 8, 2015

And more on using lib2to3:
Stack Overflow answer on lib2to3 Architecture Documentation points to Lennart Regebro's Porting to Python 3 book, in particular the chapter Extending 2to3 with your own fixers.

@orbeckst
Copy link
Member Author

orbeckst commented Aug 9, 2015

Add #389 (selectAtoms() to select_atoms()) to table.

@richardjgowers
Copy link
Member

So the camelCase/single_underscore is a nicety, but it's not going to break any scripts, and will issue a nice message.

I think it's a lot more realistic for the script to read a script then give a line of line numbers and possible changes that need making, so something like:

ten2eleven.py my_awesome_analysis.py

line 10: 'print atom.number' - 'atom.number' has changed to 'atom.index'
line 11: 'idx = atomgroup.indices()' - 'atomgroup.indices is now a property not a method'

But if someone can make it replace things in place then that's better. Either way we'll need a list of replacements.

@orbeckst
Copy link
Member Author

We definitely need the list of changes; I hope that @tylerjereddy 's page MDAnalysis 0.11.0 unifying release user guide will have this list.

I think we can write fixes for lib2to3 that do some of the replacements in a smarter way than sed. There are also recipes for rewriting import statements. However, this week I won't have the time to look at this in more detail but Extending 2to3 with your own fixers looked promising.

(rope seems to be designed to do replacements but as part of an interactive frame work, e.g. hooked into vim or emacs. The problem that I saw was that you need to know the offset in the file of the term that you want to refactor, i.e. you put the cursor mark there and then there's a function that guesses the word, passes it through the static code analyzer, and eventually makes the replacement.)

@tylerjereddy
Copy link
Member

Let me take a look. If my progress is too slow we can always re-assign to someone else.

@tylerjereddy tylerjereddy self-assigned this Aug 12, 2015
@orbeckst
Copy link
Member Author

Sure, thanks.

I would definitely try to use lib2to3. The docs on writing fixes seem a decent starting point. (And it actually sounds like a fun exercise to me.)

Oliver Beckstein
email: orbeckst@gmail.com
skype: orbeckst

Am Aug 12, 2015 um 5:58 schrieb Tyler Reddy notifications@github.com:

Let me take a look. If my progress is too slow we can always re-assign to someone else.


Reply to this email directly or view it on GitHub.

@tylerjereddy
Copy link
Member

I've produced a crude fixer (https://github.com/tylerjereddy/ten2eleven) that seems to work on a simple test conversion for selectAtoms to select_atoms. Currently it seems to be necessary to copy the fixer to lib2to3/fixes/ and then you can generate the diff with:

2to3 -f selectatoms test_dummy_old_MDA_code.py

For the modules that now have different import paths, the fixers will be more complicated than that one! The example page Oli linked shows one way to deal with changing import statements (have to deal with from and various import styles, which can get nasty). Writing tests will be important to ensure we have the correct behaviour, and it can be tricky to think of the corner cases.

@orbeckst
Copy link
Member Author

@tylerjereddy – progress! Do you see a way that we can make a self-contained package so that users can pip install our upgrade script? I am thinking of making it a separate repo and package. It would be nice if we can get away with just copying & modifying the original 2to3 script, using import lib2to3everywhere and just providing our fixers.

@tylerjereddy
Copy link
Member

Not sure about that just yet. I've added a new fixer for converting atomgroup methods to properties. That one was much harder, but seems to work properly. The best bet beyond the reference you provided is just to hack patterns together based on the fixers already in the standard library. Presumably we can deal with the packaging retroactively after various fixers are drafted.

@tylerjereddy
Copy link
Member

Likewise working for atomgroup count methods camelcase --> new properties.

@orbeckst
Copy link
Member Author

I think I would try out the suggestion to use the find_pattern.py script, preferrably on the older unit tests: Perhaps a good test is to use 0.10.0 unit tests, run, check failures, convert, run tests and hopefully end up with zero failures.

@tylerjereddy
Copy link
Member

Ok, it is now possible to run the current full suite of fixers on the command line like this:

python ten2eleven.py test_dummy_old_MDA_code.py

or a specific fixer:

python ten2eleven.py -f agcountmethods test_dummy_old_MDA_code.py

Now, as far as installing with pip and using on other systems, I may need a little help, but there are only three lines of code in ten2eleven.py, so I imagine those familiar with packaging details can handle that.

We also have a new fixer for the import statement migrations to MDAnalysis.lib, etc. It seems to work well for import and from x import, but yeah will need thorough testing beyond my crude dummy module, probably as you suggest with the unit tests eventually. I'm also thinking that if people do import MDAnalysis as MDA or mda or any arbitrary thing, that might be challenging to deal with downstream calls to the migrated modules. Still, I think this is moving in the right direction.

@richardjgowers
Copy link
Member

Fixes required because of #373

# These methods are now properties returning an object
AtomGroup.bond() -> AtomGroup.bond.value()
AtomGroup.angle() -> AtomGroup.angle.value()
AtomGroup.torsion() -> AtomGroup.dihedral.value()
AtomGroup.improper() -> AtomGroup.improper.value()

# Torsions now called dihedrals throughout package
# (class names)
Torsion -> Dihedral
Improper_Torsion -> ImproperDihedral

AtomGroup.torsions -> AtomGroup.dihedrals
Atom.torsions -> Atom.dihedrals
Universe.torsions -> Universe.dihedrals

# From MDAnalysis.lib.distances
calc_torsions() -> calc_dihedrals()

@tylerjereddy
Copy link
Member

I've added two new fixers to deal with some of those. More to come, hopefully, as time permits.

@richardjgowers
Copy link
Member

These are the camelCase changes. Should just be simple find and replaces

# from core.AtomGroup
totalMass -> total_mass
totalCharge -> total_charge
centerOfGeometry -> center_of_geometry
radiusOfGyration -> radius_of_gyration
shapeParameter -> shape_parameter
momentOfInertia -> moment_of_inertia
principalAxes -> principal_axes
packIntoBox -> pack_into_box
asUniverse -> as_universe

# from lib.distances
applyPBC -> apply_PBC

EDIT 2015-8-30 (@orbeckst): fixed radiusOfGyration

@tylerjereddy
Copy link
Member

Ok, I wrote a new fixer to cover those.

@tylerjereddy
Copy link
Member

So, I've started testing ten2eleven.py conversion of 0.10 unit tests to 0.11 friendly code after making several more fixers. The basic procedure:

  1. need to checkout old version (0.10) of testsuite/MDAnalysisTests: git checkout 06c93bc0f6e66423 -- testsuite/MDAnalysisTests
    • have to change __version__ in __init__.py to allow running tests with different version of library
  2. ensure the above tests fail with 0.11-dev MDA code
    • I get: FAILED (SKIP=3, errors=189, failures=118)
  3. try converting 0.10 unit test code to 0.11 unit test code with ten2eleven.py: python ten2eleven.py -W *py
  4. re-run the converted unit tests:
    • I get: FAILED (SKIP=3, errors=73, failures=144)

The substantial reduction in the number of errors relative to a modest increase in the number of failures is probably a decent sign for the first attempt. I see that there are a lot of issues with failures involving frame numbers and trajectory slicing, which is not surprising given the fact that the module doesn't handle this particular issue at the moment.

Consider for example this test:

self.trajectory.rewind()
assert_equal(self.ts.frame, 1, "rewinding to frame 1")

One can easily imagine various similar cases where a user or a unit test would have any arbitrary frame number based assumption in their code. I'm skeptical that we could account for all of these cases, though I'd be happy to be proven wrong! Adjusting all instances of trajectory[index] to trajectory[index-1] would be risky I think (i.e., trajectory[0] is the first frame in both 0.10 and 0.11, right? it is just that the frame number is different). We could try warning the user if the keywords trajectory or frame are in a line I suppose, or maybe could cook up some "smart defaults" with warnings. Thoughts?

@richardjgowers
Copy link
Member

trajectory[index] didn't actually change, but rather the frame of the returned Timestep changed

# Previously
ts = u.trajectory[i]
ts.frame == i + 1

# Now
ts = u.trajectory[i]
ts.frame == i

So you might be able to look for .frame and issue warnings.

@tylerjereddy
Copy link
Member

Yes, that is what I was arguing in my comment--difficult to do this kind of conversion automatically for any arbitrary usage of a frame number.

@orbeckst
Copy link
Member Author

Hi @tylerjereddy ,

you're doing a terrific job here!

I agree with what was said before: Do the safe conversions and point out potential problems as much as possible. Perhaps even by inserting comments into the code along the lines of

# ten2eleven.py detected a possible incompatibility between this code and MDAnalysis >= 0.11.0
# NAME / DESCRIPTION / DATE
# Please manually review the following line (and remove these comments afterwards):
ts.frame == i + 1

I also opened a stub page on the wiki for migrating MDAnalysis code with ten2eleven.py. Can you please add a howto description there that tells users what to do and possibly what potential pitfalls are?

Cheers!

@orbeckst
Copy link
Member Author

We also have to decide how to make ten2elevn.py available. I see two options

  1. Add it to the MDAnalysis 0.11.0 distribution.
    • install as an executable script
    • advantage:
      • batteries included, user has everything
      • tight coupling between fixes and release
    • disadvantage:
      • MDAnalysis never installed scripts before, so this is dirty
      • what about users that skipped 0.11.0? How do they get the script to eventually upgrade?
  2. Make it a separate package (also on PyPi), which gets its own repository.
    • installs itself as a script with zero dependencies
    • advantages:
      • clean, does not pollute the library package itself
      • can be uninstalled after use
      • can be separately maintained and we can use the same framework for other upgrades if need be (it's so neat that I think we should....)
    • disadvantages:
      • user needs additional package (but can be automated by specifying it as a dependency of 0.11.0)

I vastly favour (2) but perhaps other people have different opinions, in particular @richardjgowers as release manager will have a deciding say in this.

@dotsdl
Copy link
Member

dotsdl commented Aug 30, 2015

I favor 1. I think it's okay to continue providing the script as a convenience. Ultimately with an API break there will still be issues that require manual intervention, but the hope is that this eases the pain and codifies the changes that one needs to make.

@richardjgowers
Copy link
Member

Is it possible to make it work as

import MDAnalysis as mda

mda.ten2eleven('myfile.py')  # spits out a file in same dir with fixes

If I've understood this correctly, this is option 1 lite.

@dotsdl
Copy link
Member

dotsdl commented Aug 31, 2015

@tylerjereddy I updated Oliver's table at the top with the changes to setters (singular to plural)

@tylerjereddy
Copy link
Member

@dotsdl Ok, I've added a new fixer that handles the following:

ag.set_mass(new) --> ag.set_masses(new)
ag.set_charge(new) --> ag.set_charges(new)
ag.set_name(new) --> ag.set_names(new)
ag.set_type(new) --> ag.set_types(new)
ag.set_radius(new) --> ag.set_radii(new)
ag.set_bfactor(new) --> ag.set_bfactors(new)
ag.set_altloc(new) --> ag.set_altlocs(new)
ag.set_serial(new) --> ag.set_serials(new)
ag.set_resid(new) --> ag.set_resids(new)
ag.set_resname(new) --> ag.set_resnames(new)
ag.set_resnum(new) --> ag.set_resnums(new)
ag.set_segid(new) --> ag.set_segids(new)

I see that you've added a sample conversion to the unifying release user guide, so that's good.

@tylerjereddy
Copy link
Member

I'm still investigating the other things discussed here. At the moment I'm focused on the fixer for the ts.frame numbering issue. I don't know what the status will be by the intended release date (Friday?), but I'll do what I can.

@richardjgowers
Copy link
Member

With the .frame fix, because we're only issuing a warning, it could just be a quick search for the string ".frame"?

richardjgowers added a commit that referenced this issue Sep 3, 2015
Option for Issue #377

Allows usage as:
```
import MDAnalysis as mda

mda.ten2eleven('myscript.py')
```
@richardjgowers
Copy link
Member

I pushed a branch that takes @tylerjereddy 's tool and includes it in MDAnalysis. It then allows usage as:

import MDAnalysis as mda

mda.ten2eleven('script.py')

It's kind of @orbeckst 's option 1, except it isn't intrusively installing a script, and it will be present in the library for as long as we need it. If we put another change in 0.12.0, we'd just add another fixer and "ten2eleven" would actually work there too. Maybe "old2new" / "makemyscriptworkagainplease" or something might be a more future proof name.

@kain88-de
Copy link
Member

Maybe "old2new" / "makemyscriptworkagainplease" or something might be a more future proof name.

Sure about this? For me such a name would imply that I can use this kind of script for any future version upgrade (0.12 -> 0.13, 0.10 -> 1.00, ... ). Then the converting script needs to be updated and maintained for a long time for all different update combinations. I would figure that this is a huge amount of extra work that could be spend better otherwise. For a library like MDAnalysis deprecating old API's for 1-2 releases and then removing them sounds like a better approach in the long run.

Offering a conversion script once for a huge (meaning a log of) API break is ok imho but as far as I understood the next versions of MDA are not planned to have huge API breaks.

@richardjgowers
Copy link
Member

I think we can make it work for future changes (if any). I don't think different combinations will be a problem, things we change aren't getting changed again.

It would be more annoying if we set this in stone and had to make a new script for 0.12 -> 0.13 (for example). I think the big features for 1.0 is vaguely looking like:

@tylerjereddy
Copy link
Member

I'm still working to deal with .frame issues. It is amazingly complicated. For example, inserting the comment before an indented line containing .frame abrogates the indentation information in the prefix and dedents the affected line. I'll have to see if I can dig the indentation information out of the nodes and still get the comment lines in gracefully.

@richardjgowers: is there a way we can eventually merge my repo in so I get a few commit credits to the dev branch (I rarely get to commit code changes these days it seems!). Obviously not that important as I've got my name on the fixer docstrings.

@richardjgowers
Copy link
Member

I think I've got it working here: https://github.com/MDAnalysis/mdanalysis/commits/feature-merged-ten2eleven

If you've got nothing else locally, then you can merge that into develop here and work on this repo, otherwise I can redo this

(I used this: MDA was "A" ten2eleven was "B"

http://blog.caplin.com/2013/09/18/merging-two-git-repositories/
)

Edit:
Oh wait, I still need to delete the "old" versions of the files, but this should get the commits in anyway.

@orbeckst
Copy link
Member Author

orbeckst commented Sep 4, 2015

If we are going down the module road then perhaps make a sub package migration:

from MDAnalysis.migration import ten2eleven

At least keeps the name space cleaner.

Oliver Beckstein
email: orbeckst@gmail.com
skype: orbeckst

Am Sep 3, 2015 um 7:21 schrieb Richard Gowers notifications@github.com:

import MDAnalysis as mda

mda.ten2eleven('script.py')
It's kind of @orbeckst 's option 1, except it isn't intrusively installing a script, and it will be present in the library for as long as we need it. If we put another change in 0.12.0, we'd just add another fixer and "ten2eleven" would actually work there too. Maybe "old2new" / "makemyscriptworkagainplease" or something might be a more future proof name.

@tylerjereddy
Copy link
Member

Ok, I assume I can continue working on that feature branch (I've already pushed to it).

I don't think I'm going to solve the .frame issue today. This is an example problem case:

#frame warning with indentation complexity:                                                      
class Dummy(object):                                                                    
    assert_almost_equal(ts.frame, 544) 

My fixers just notice .frame outside of the class or function they might be defined in [although list comprehensions and a few other things are recognized], so they end up nuking the indentation information. The reason this isn't a problem with my other fixers is that we aren't adding comments, so we don't mess with non-code components of the parse tree for the most part. I only noticed this because it causes major issues with conversion of unit tests (plenty of classes with .frame usage).

You can see some brief discussion of this type of issue in the standard library fix_metaclass.py fixer, which deals with class Dummy: pass differently than classes with code blocks as the former doesn't have a suite with indentation information in the parse tree.

@richardjgowers
Copy link
Member

I'm happy to leave .frame as a known issue. It's not going to break any scripts

lohani2280 pushed a commit to lohani2280/mdanalysis that referenced this issue Feb 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants