Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New method for representing galaxy in ephemeris #115

Open
rhpvorderman opened this issue Oct 23, 2018 · 13 comments
Open

New method for representing galaxy in ephemeris #115

rhpvorderman opened this issue Oct 23, 2018 · 13 comments

Comments

@rhpvorderman
Copy link
Contributor

By suggestion of @mvdbeek . Testing against the docker image is overkill in most use cases and leads to very slow testing.
Testing against a simple galaxy instance that does not have a full stack of postgres, nginx etc. is better. This is implemented in planemo already. Maybe @jmchilton can give us some tips how to go around doing this? We could make it an independent library which all galaxy projects can use.

@rhpvorderman
Copy link
Contributor Author

As a note: I would like to add that git clone --depth=1 https://github.com/galaxyproject/galaxy.git downloads only around 15 mb or so. Much better than the galaxy docker image in that respect.

@jmchilton
Copy link
Member

Planemo has a few different ways to run a bootstrapped Galaxy for workflow testing. The Docker image is the fastest typically because it already has all the wheels installed and Conda installed on it. Certainly the code in Planemo could be optimized with a new flag for the one off install for CI test to make it faster I imagine but I like testing against the Docker image myself.

Merging more of test.sh into the pytest or at last merging the two separate tox jobs into one might speed things up a bit? The whole suite doesn't seem to take too long to run compared to other projects like Planemo or bioblend. Another idea is to separate some testing out into a separate job that doesn't run against PRs but we'd still like to track - I have a whole set of extended tests for Planemo that run daily but would just be too much testing for each PR (and it happens to be too unstable) - https://github.com/jmchilton/planemo-extended-tests. Another possibility is to use Jenkins - it would have the images cached after the first execution and then everything would be faster I think.

@mvdbeek
Copy link
Member

mvdbeek commented Oct 23, 2018

The main issue is the startup times for the fat container (that exceeds 60 seconds easily) and needing docker, which requires sudo: true, and that makes for much longer waiting for tests to start.
(Also I don't like the docker container because it's huge, I always run out of space on my laptop).

I was mentioning this on gitter:

I think something neat for the various projects that run tests against some sort of galaxy would be a pytest plugin. This could be used in galaxy itself, planemo, bioblend, epehemeris etc
planemo already does most of this, it can clone galaxy, run a local galaxy, run galaxy in docker etc... but we wouldn't want to include planemo in ephemeris' dependencies, as planemo depends on ephemeris.

If that's something we can agree on I can spend some time on this.

@jmchilton
Copy link
Member

I worry about competing against ourselves here. The container is too fat for testing is a real problem and one can imagine some easy fixes like building a variant of it that does a shallow clone of Galaxy and such. This way wouldn't be competing with the Ansible scripts. There is a lot of Python code around doing those things - if we could reuse or decompose Planemo instead of duplicating the effort that would be great.

Are we sure it depending on ephemeris is a problem? Won't tox install it as a development dependencies and then when it goes to install ephemeris just replace the dependent version? bioblend would I suppose have the same problem - and would also have the problem that 🦅👀 wants to support versions of Galaxy Planemo probably can't currently bootstrap (maybe I'm wrong though and Planemo can still bootstrap old Galaxys). I'm open to decomposing Planemo if reusing it is not an option, move the common library stuff into galaxy-lib and create a galaxy-bootstrap Python library of something (to compete with the Java one https://github.com/jmchilton/galaxy-bootstrap)?

I do find the sudo: false thing a bit un-compelling, there are test solutions that allow multi-container testing configurations (eg. Jenkinsfile) and I think this is the future. We should be using Docker for testing when it makes sense, even if there are some practical downsides with the particular CI/laptops were using currently.

All that said - I'd love to see this pytest plugin and some Python library formalism around all the ways Planemo can bootstrap Galaxy (e.g. https://github.com/galaxyproject/planemo/blob/master/scripts/run_galaxy_workflow_tests.sh#L31). Work toward that would be most appreciated.

@bgruening
Copy link
Member

There used to be a small version of the container, e.g. https://github.com/bgruening/docker-galaxy-stable/tree/slim and we could, of course, could maintain those for testing purpose as well.

@mvdbeek
Copy link
Member

mvdbeek commented Oct 23, 2018

Are we sure it depending on ephemeris is a problem? Won't tox install it as a development dependencies and then when it goes to install ephemeris just replace the dependent version? bioblend would I suppose have the same problem - and would also have the problem that 🦅👀 wants to support versions of Galaxy Planemo probably can't currently bootstrap (maybe I'm wrong though and Planemo can still bootstrap old Galaxys). I'm open to decomposing Planemo if reusing it is not an option

That might work, I could give this a try.

@rhpvorderman
Copy link
Contributor Author

There used to be a small version of the container, e.g. https://github.com/bgruening/docker-galaxy-stable/tree/slim and we could, of course, could maintain those for testing purpose as well.

A small container is an interesting proposition. A container without nginx and postgres. That justs starts galaxy with all the installed dependencies. Should be quite fast while starting. I fully agree with @jmchilton that docker is not really the problem here, but the implementation in our testing.
A dedicated testing image seems like a very good way to go, since we can use the current test code in ephemeris. The test code concerning the docker image is small and was not much effort to create (I wrote it in the span of one morning including testing).

I could do some work on the small testing image. I have some experience with docker. But that also depends on how far @mvdbeek is with implementing the planemo testing is. I do not want to duplicate effort.

@jmchilton
Copy link
Member

jmchilton commented Oct 23, 2018

It isn't going to be stable for testing without postgres. It should have postgres and pre-migrated database I believe. I doubt nginx is slowing things down much or bulking things up much and so I'd rather it have that - but if it doesn't I am fine also.

@mvdbeek
Copy link
Member

mvdbeek commented Oct 23, 2018

I think a smaller image is orthogonal to what I'd do, this would surely not be a waste of time

@mvdbeek
Copy link
Member

mvdbeek commented Oct 23, 2018

Another angle is to have a container that can be docker commited, so you can have a fresh instance very rapidly. We don't care about storing data in that case

@rhpvorderman
Copy link
Contributor Author

Noted. I will try something.

@rhpvorderman
Copy link
Contributor Author

I while ago I made a small container based on 19.05: https://github.com/LUMC/docker-galaxy-uwsgi
Would this be useful for this purpose? The docker script is set in such a way that an empty sqlite database is already incorporated. This container therefore has a very low startup time.
I could update it to work with the latest version of galaxy. Maybe set up some travis to build from the dev version every week?
Would that be a solution here?

@rhpvorderman
Copy link
Contributor Author

Hi all, I made a branch https://github.com/galaxyproject/ephemeris/tree/simplifytesting (don't be fooled by the name: tox.ini is greatly simplified but the docker setup is not).

I made a nice minimal setup with a galaxy container, a postgres container and a nginx reverse proxy. I almost got it working using the galaxy-k8s container. The advantage of the galaxy-k8s container is that it is light, and starts very quickly. This setup has a lot less overhead compared to the bgruening/galaxy-stable container, while still including nginx and postgres.

Unfortunately the galaxy-k8s container has disabled conda_auto_init so simple shed-tools testing does not work. There is also galaxy-min, but that has also disabled conda_auto_init 😞 . There is no way to get this working without some conda support in the container.

Is there an official galaxy container that has a 21.01 tag, only starts an uwsgi process and just works out of the box with conda?

If not, I think there is a solution:

  • this issue needs to be solved: Install miniconda and make virtualenv from conda ansible-galaxy#110
  • a 'galaxy/galaxy-simple' container needs to be made. It will use the same setup as galaxy-k8s (using ansible-galaxy) but have conda preinstalled. It will also follow the Filesystem Hierarchy Standard and have all mutable data in /var/lib/galaxy so it becomes much easier to use.

I am volunteering to put the work in because such a galaxy-simple image would fill a niche that is not yet filled. (Simple docker-compose deployments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants