Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update sequenza #601

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

DivyaratanPopli
Copy link

Modified the environment, wrapper and Snakefile for cnv calling step to be able to install and run a new version of sequenza (@07116cc).

@DivyaratanPopli DivyaratanPopli linked an issue Jan 30, 2025 that may be closed by this pull request
Copy link

  • Please format your Python code with ruff: make fmt
  • Please check your Python code with ruff: make check
  • Please format your Snakemake code with snakefmt: make snakefmt

You can trigger all lints locally by running make lint

@ericblanc20 ericblanc20 changed the title 599 sequenza version is outdated fix: sequenza version is outdated Jan 30, 2025
@coveralls
Copy link

Coverage Status

coverage: 85.808%. remained the same
when pulling de8b4df on 599-sequenza-version-is-outdated
into f49fafb on main.

@ericblanc20 ericblanc20 self-requested a review January 30, 2025 15:04
Copy link
Contributor

@ericblanc20 ericblanc20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, what is now left:

  • Moving the packages description to the wrapper side of snappy, and
  • Creating a model for sequenza.extract & sequenza.fit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, the definition of the package(s) should be put near the wrappers, not in the Snakefile.

I suggest to create a R_environment.json file in the snappy_wrappers/wrappers/sequenza/install folder, such as:

[
    {
        "name": "Runuran",
        "repository": "cran"
    },
    {
        "name": "sequenza",
        "repository": "bitbucket",
        "url": "sequenzatools/sequenza@07116cc"
    }
]

Copy link
Contributor

@ericblanc20 ericblanc20 Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would re-write the function, making use of the R_environment.json information. As the package name is provided by the user, there is much simplification to be gained, for example:

def install_R_package(
    dest: str, name: str, repository: str = "cran", url: str | None = None
) -> subprocess.CompletedProcess:
    assert dest, "Missing R package destination folder"
    os.makedirs(os.path.dirname(dest), mode=0o750, exist_ok=True)

    match repository:
        case "cran":
            if not url:
                url = "https://cloud.r-project.org"
            install_cmd = f"install.packages('{name}', lib='{dest}', repos='{url}', update=FALSE, ask=FALSE)"
        case "bioconductor":
            install_cmd = f"BiocManager::install('{name}', lib='{dest}', update=FALSE, ask=FALSE)"
        case "github":
            assert url, f"Can't install R package '{name}' from github, URL is missing"
            install_cmd = f"remotes::install_github('{url}', lib='{dest}', upgrade='never')"
        case "bitbucket":
            assert url, f"Can't install R package '{name}' from bitbucket, URL is missing"
            install_cmd = f"remotes::install_bitbucket('{url}', lib='{dest}', upgrade='never')"
        case "local":
            assert url, f"Can't install local R package '{name}', missing path"
            assert os.path.exists(url), f"Can't find local R package '{name}' at location '{url}'"
            install_cmd = f"install.packages('{url}', repos=NULL, lib='{dest}', update=FALSE, ask=FALSE)"
        case _:
            raise ValueError("Unknown repository '{repository}'"
    R_script = [
        f".libPaths(c(.libPaths(), '{dest}'))",
        install_cmd,
        f"status <- try(find.package('{name}', lib.loc='{dest}', quiet=FALSE, verbose=TRUE))",
        "status <- ifelse(is(status, 'try-error'), 1, 0)",
        "quit(save='no', status=status, runLast=FALSE)",
    ]
    cmd = ["R", "--vanilla", "-e", "; ".join(R_script)]
    return subprocess.run(cmd, text=True, check=True)

(Please check my code, I haven't tested it...)

Copy link
Contributor

@ericblanc20 ericblanc20 Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I would perhaps add another function, for example install_R_packages, which would read the json file & loop over all packages for installation. For example:

import json

def install_R_packages(dest: str, filename: str):
    with open(filename, "rt") as f:
        packages = json.load(filename)
    for package in packages:
        status = install_R_package(dest, name=package["name"], repository=package["repository"], url=package.get("url", None))
        status.check_returncode()

Then, the wrapper snappy_wrappers/wrappers/sequenza/install/wrapper.py might just read:

# -*- coding: utf-8 -*-
"""Installation of sequenza non-standard packages"""

import os
import sys

# The following is required for being able to import snappy_wrappers modules
# inside wrappers.  These run in an "inner" snakemake process which uses its
# own conda environment which cannot see the snappy_pipeline installation.
base_dir = os.path.abspath(os.path.dirname(__file__))
while os.path.basename(base_dir) != "snappy_wrappers":
    base_dir = os.path.dirname(base_dir)
sys.path.insert(0, os.path.dirname(base_dir))

from snappy_wrappers.utils import install_R_packages

__author__ = "Eric Blanc <eric.blanc@bih-charite.de>"

dest = os.path.dirname(str(snakemake.output.done))
install_R_packages(dest, os.path.join(os.path.dirname(__file__), "R_environment.json"))

These are just suggestions. I believe that they could facilitate maintaining the R packages, but perhaps there are unforeseen problems with them, or you have a simpler solution (for example, I don't like json file, I would prefer yaml, but the latter would require adding to the environment.yaml some yaml python library, which I think will only make the environment more difficult to maintain).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the arguments of R function sequenza.extract & sequenza.fit, & build a model from them.

The man pages for the functions can be viewed from R using:

library(sequenza)
?sequenza.extract
?sequenza.fit

But I have a suspicion that the man pages are not yet updated for the new code.
Typing: sequenza.extract will list the function's complete code. You can check the argument list, and verify if some of them should not be included in the list (verbosity, input file(s) selected by the pipeline, plots size & format, ...)

@tedil tedil changed the title fix: sequenza version is outdated fix: update sequenza Jan 31, 2025
@tedil
Copy link
Member

tedil commented Jan 31, 2025

@DivyaratanPopli we have a pre-commit config which runs formatting (on python and snakemake files) on each commit automatically, see https://github.com/bihealth/snappy-pipeline/blob/main/docs/installation.rst#installing-pre-commit-hooks which can be very convenient ;)

@tedil
Copy link
Member

tedil commented Feb 5, 2025

Another possibility, though I am not sure how well this works with wrappers:
Snakemake provides so called post-deployment scripts, which allow you to install additional things into the freshly created conda environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sequenza version is outdated
4 participants