Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update sequenza #601

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, the definition of the package(s) should be put near the wrappers, not in the Snakefile.

I suggest to create a R_environment.json file in the snappy_wrappers/wrappers/sequenza/install folder, such as:

[
    {
        "name": "Runuran",
        "repository": "cran"
    },
    {
        "name": "sequenza",
        "repository": "bitbucket",
        "url": "sequenzatools/sequenza@07116cc"
    }
]

Original file line number Diff line number Diff line change
Expand Up @@ -241,11 +241,6 @@ rule somatic_targeted_seq_cnv_calling_cnvetti_off_target_postprocess:
rule somatic_targeted_seq_cnv_calling_sequenza_install:
output:
**wf.get_output_files("sequenza", "install"),
params:
packages=[
{"name": "aroneklund/copynumber", "repo": "github"},
{"name": "sequenzatools/sequenza", "repo": "bitbucket"},
],
threads: wf.get_resource("sequenza", "install", "threads")
resources:
time=wf.get_resource("sequenza", "install", "time"),
Expand Down
93 changes: 38 additions & 55 deletions snappy_wrappers/utils.py
Copy link
Contributor

@ericblanc20 ericblanc20 Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would re-write the function, making use of the R_environment.json information. As the package name is provided by the user, there is much simplification to be gained, for example:

def install_R_package(
    dest: str, name: str, repository: str = "cran", url: str | None = None
) -> subprocess.CompletedProcess:
    assert dest, "Missing R package destination folder"
    os.makedirs(os.path.dirname(dest), mode=0o750, exist_ok=True)

    match repository:
        case "cran":
            if not url:
                url = "https://cloud.r-project.org"
            install_cmd = f"install.packages('{name}', lib='{dest}', repos='{url}', update=FALSE, ask=FALSE)"
        case "bioconductor":
            install_cmd = f"BiocManager::install('{name}', lib='{dest}', update=FALSE, ask=FALSE)"
        case "github":
            assert url, f"Can't install R package '{name}' from github, URL is missing"
            install_cmd = f"remotes::install_github('{url}', lib='{dest}', upgrade='never')"
        case "bitbucket":
            assert url, f"Can't install R package '{name}' from bitbucket, URL is missing"
            install_cmd = f"remotes::install_bitbucket('{url}', lib='{dest}', upgrade='never')"
        case "local":
            assert url, f"Can't install local R package '{name}', missing path"
            assert os.path.exists(url), f"Can't find local R package '{name}' at location '{url}'"
            install_cmd = f"install.packages('{url}', repos=NULL, lib='{dest}', update=FALSE, ask=FALSE)"
        case _:
            raise ValueError("Unknown repository '{repository}'"
    R_script = [
        f".libPaths(c(.libPaths(), '{dest}'))",
        install_cmd,
        f"status <- try(find.package('{name}', lib.loc='{dest}', quiet=FALSE, verbose=TRUE))",
        "status <- ifelse(is(status, 'try-error'), 1, 0)",
        "quit(save='no', status=status, runLast=FALSE)",
    ]
    cmd = ["R", "--vanilla", "-e", "; ".join(R_script)]
    return subprocess.run(cmd, text=True, check=True)

(Please check my code, I haven't tested it...)

Copy link
Contributor

@ericblanc20 ericblanc20 Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I would perhaps add another function, for example install_R_packages, which would read the json file & loop over all packages for installation. For example:

import json

def install_R_packages(dest: str, filename: str):
    with open(filename, "rt") as f:
        packages = json.load(filename)
    for package in packages:
        status = install_R_package(dest, name=package["name"], repository=package["repository"], url=package.get("url", None))
        status.check_returncode()

Then, the wrapper snappy_wrappers/wrappers/sequenza/install/wrapper.py might just read:

# -*- coding: utf-8 -*-
"""Installation of sequenza non-standard packages"""

import os
import sys

# The following is required for being able to import snappy_wrappers modules
# inside wrappers.  These run in an "inner" snakemake process which uses its
# own conda environment which cannot see the snappy_pipeline installation.
base_dir = os.path.abspath(os.path.dirname(__file__))
while os.path.basename(base_dir) != "snappy_wrappers":
    base_dir = os.path.dirname(base_dir)
sys.path.insert(0, os.path.dirname(base_dir))

from snappy_wrappers.utils import install_R_packages

__author__ = "Eric Blanc <eric.blanc@bih-charite.de>"

dest = os.path.dirname(str(snakemake.output.done))
install_R_packages(dest, os.path.join(os.path.dirname(__file__), "R_environment.json"))

These are just suggestions. I believe that they could facilitate maintaining the R packages, but perhaps there are unforeseen problems with them, or you have a simpler solution (for example, I don't like json file, I would prefer yaml, but the latter would require adding to the environment.yaml some yaml python library, which I think will only make the environment more difficult to maintain).

Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
"""Utility code for snappy_wrappers"""

import os
import re
import subprocess
import json

__author__ = "Manuel Holtgrewe <manuel.holtgrewe@bih-charite.de>"

Expand Down Expand Up @@ -57,67 +57,50 @@ def patched(*args, **kwargs):
return patched


def install_R_package(dest: str, name: str, repo: str):
"""Installs R package <name> in directory <dest>

<repo> can be on of "cran", "bioconductor", "github", "bitbucket" or "local".

Github & bitbucket packages can be further defined in the <name>,
choosing a commit, a tag or a pull request (see the remotes package reference).
For local packages, the <name> must be the path to the package source.

In these cases, the routine tries to be clever and guess the package name from
the <name>. It implements the following recipes:

- basename of the path without (tar/zip) extension for local packages
- remove username, subdirectory and reference/release/pull request for github & bitbucket

When installation isn't successful, a subprocess.CalledProcessError is raised
"""

assert repo in (
"cran",
"bioconductor",
"github",
"bitbucket",
"local",
), f"Unknown/unimplemented repository {repo}"

def install_R_package(
dest: str, name: str, repository: str = "cran", url: str | None = None
) -> subprocess.CompletedProcess:
assert dest, "Missing R package destination folder"
os.makedirs(os.path.dirname(dest), mode=0o750, exist_ok=True)

if repo == "cran":
install_cmd = f"install.packages('{name}', lib='{dest}', update=FALSE, ask=FALSE)"
elif repo == "bioconductor":
install_cmd = f"BiocManager::install('{name}', lib='{dest}', update=FALSE, ask=FALSE)"
elif repo == "github":
path = name
pattern = re.compile("^([^/]+)/([^/]+)(/[^@|]+)?([@|].+)?$")
m = pattern.match(path)
assert m, f"Cannot extract package name from github path {path}"
name = m.groups()[1]
install_cmd = f"remotes::install_github('{path}', lib='{dest}', upgrade='never')"
elif repo == "bitbucket":
path = name
pattern = re.compile("^([^/]+)/([^/]+)([/@].+)?$")
m = pattern.match(path)
assert m, f"Cannot extract package name from bitbucket path {path}"
name = m.groups()[1]
install_cmd = f"remotes::install_bitbucket('{path}', lib='{dest}', upgrade='never')"
elif repo == "local":
path = name
pattern = re.compile("^(.+?)(\\.(zip|tar(\\.(gz|bz2))?|tgz2?|tbz))?$")
m = pattern.match(os.path.basename(path))
assert m, f"Cannot extract package name from local filesystem path {path}"
name = m.groups()[0]
install_cmd = f"remotes::install_local('{path}', lib='{dest}', upgrade='never')"
else:
install_cmd = None

match repository:
case "cran":
install_cmd = f"install.packages('{name}', lib='{dest}', repos='https://cloud.r-project.org', update=FALSE, ask=FALSE)"
case "bioconductor":
install_cmd = f"BiocManager::install('{name}', lib='{dest}', update=FALSE, ask=FALSE)"
case "github":
assert url, f"Can't install R package '{name}' from github, URL is missing"
install_cmd = f"remotes::install_github('{url}', lib='{dest}', upgrade='never')"
case "bitbucket":
assert url, f"Can't install R package '{name}' from bitbucket, URL is missing"
install_cmd = f"remotes::install_bitbucket('{url}', lib='{dest}', upgrade='never')"
case "local":
assert url, f"Can't install local R package '{name}', missing path"
assert os.path.exists(url), f"Can't find local R package '{name}' at location '{url}'"
install_cmd = (
f"install.packages('{url}', repos=NULL, lib='{dest}', update=FALSE, ask=FALSE)"
)
case _:
raise ValueError("Unknown repository '{repository}'")
R_script = [
f".libPaths(c(.libPaths(), '{dest}'))",
install_cmd,
f"status <- try(find.package('{name}', lib.loc='{dest}', quiet=FALSE, verbose=TRUE))",
"status <- ifelse(is(status, 'try-error'), 1, 0)",
"quit(save='no', status=status, runLast=FALSE)",
]
cmd = ["R", "--vanilla", "-e", "; ".join(R_script)]
return subprocess.run(cmd, text=True, check=True)


def install_R_packages(dest: str, filename: str):
with open(filename, "rt") as f:
packages = json.load(f)
for package in packages:
status = install_R_package(
dest,
name=package["name"],
repository=package["repository"],
url=package.get("url", None),
)
status.check_returncode()
1 change: 1 addition & 0 deletions snappy_wrappers/wrappers/conftest.py
16 changes: 16 additions & 0 deletions snappy_wrappers/wrappers/sequenza/install/R_environment.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[
{
"name": "RcppArmadillo",
"repository": "cran"
},
{
"name": "Runuran",
"repository": "cran"
},
{
"name": "sequenza",
"repository": "bitbucket",
"url": "sequenzatools/sequenza@07116cc"
}
]

20 changes: 9 additions & 11 deletions snappy_wrappers/wrappers/sequenza/install/environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,15 @@ channels:
- bioconda
- nodefaults
dependencies:
- python==3.9.19
- r-base==4.3.3
- r-base==4.4.2
- r-remotes==2.5.0
# sequenza requirements (+copynumber)
- r-pbapply==1.7_2
- r-squash==1.0.9
- r-iotools==0.3_2 # See https://groups.google.com/g/sequenza-user-group/c/UA5bBNBeN18?pli=1
- r-readr==2.1.5
- r-tidyverse==2.0.0
- r-rcpp==1.0.14
- r-rcppprogress==0.4.2
- r-iotools==0.3.5
- r-pbapply==1.7.2
- r-seqminer==9.4
- r-data.table==1.15.2
# copynumber requirements
- bioconductor-s4vectors==0.40.2
- bioconductor-iranges==2.36.0
- bioconductor-genomicranges==1.54.1
- r-ks==1.14.3
- r-squash==1.0.9

20 changes: 12 additions & 8 deletions snappy_wrappers/wrappers/sequenza/install/wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,25 @@
"""Installation of sequenza non-standard packages"""

import os
from pathlib import Path
import sys

from pathlib import Path

# The following is required for being able to import snappy_wrappers modules
# inside wrappers. These run in an "inner" snakemake process which uses its
# own conda environment which cannot see the snappy_pipeline installation.
base_dir = os.path.normpath(os.path.join(os.path.dirname(__file__), "..", "..", "..", ".."))
sys.path.insert(0, base_dir)

from snappy_wrappers.utils import install_R_package # noqa: E402


base_dir = os.path.abspath(os.path.dirname(__file__))
while os.path.basename(base_dir) != "snappy_wrappers":
base_dir = os.path.dirname(base_dir)
sys.path.insert(0, os.path.dirname(base_dir))

from snappy_wrappers.utils import install_R_packages

__author__ = "Eric Blanc <eric.blanc@bih-charite.de>"

dest = os.path.dirname(str(snakemake.output.done))
for package in snakemake.params.packages:
install_R_package(dest, package["name"], package["repo"])

Path(str(snakemake.output.done)).touch()
install_R_packages(dest, os.path.join(os.path.dirname(__file__), "R_environment.json"))
Path(snakemake.output.done).touch()
4 changes: 2 additions & 2 deletions snappy_wrappers/wrappers/sequenza/run/wrapper.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the arguments of R function sequenza.extract & sequenza.fit, & build a model from them.

The man pages for the functions can be viewed from R using:

library(sequenza)
?sequenza.extract
?sequenza.fit

But I have a suspicion that the man pages are not yet updated for the new code.
Typing: sequenza.extract will list the function's complete code. You can check the argument list, and verify if some of them should not be included in the list (verbosity, input file(s) selected by the pipeline, plots size & format, ...)

Original file line number Diff line number Diff line change
Expand Up @@ -76,11 +76,11 @@ def config_to_r(x):

# Follow sequenza documentation https://bitbucket.org/sequenzatools/sequenza/src/master/
args <- list(file="{snakemake.input.seqz}", assembly="{config[assembly]}", chromosome.list={contigs})
args <- c(args, {args_extract})
#args <- c(args, {args_extract})
seqz <- do.call(sequenza.extract, args=args)

args <- list(sequenza.extract=seqz, chromosome.list={contigs}, mc.cores=1)
args <- c(args, {args_fit})
#args <- c(args, {args_fit})
CP <- do.call(sequenza.fit, args=args)

sequenza.results(sequenza.extract=seqz, cp.table=CP, sample.id="{snakemake.wildcards[library_name]}", out.dir=dirname("{snakemake.output.done}"))
Expand Down
49 changes: 40 additions & 9 deletions tests/snappy_wrappers/wrappers/test_install_R_package.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,44 @@
# -*- coding: utf-8 -*-
"""Code for testing the R package installation helper function"""

from snappy_wrappers.utils import install_R_package
from snappy_wrappers.utils import install_R_package, install_R_packages

R_packages_json = r"""[
{
"name": "cran",
"repository": "cran"
},
{
"name": "bioc",
"repository": "bioconductor"
},
{
"name": "package",
"repository": "github",
"url": "username/package/subdir@*rel.ea.se"
},
{
"name": "package",
"repository": "bitbucket",
"url": "username/package@ref"
},
{
"name": "package",
"repository": "local",
"url": "/path/to/package.tar.gz"
}
]"""

def test_install_R_package(mocker, fp):
def test_install_R_packages(mocker, fp, fs):
"""Tests install_R_package"""
mocker.patch("snappy_wrappers.utils.os.makedirs", return_value=True)
fs.create_file("/path/to/package.tar.gz")
fs.create_file("/path/to/json", contents=R_packages_json)
packages = [
{
"name": "cran",
"repo": "cran",
"install": "install.packages('{}', lib='/path/to/lib', update=FALSE, ask=FALSE)",
"install": "install.packages('{}', lib='/path/to/lib', repos='https://cloud.r-project.org', update=FALSE, ask=FALSE)",
"check": "find.package('cran', lib.loc='/path/to/lib', quiet=FALSE, verbose=TRUE)",
},
{
Expand All @@ -21,32 +48,36 @@ def test_install_R_package(mocker, fp):
"check": "find.package('bioc', lib.loc='/path/to/lib', quiet=FALSE, verbose=TRUE)",
},
{
"name": "username/package/subdir@*rel.ea.se",
"name": "package",
"url": "username/package/subdir@*rel.ea.se",
"repo": "github",
"install": "remotes::install_github('{}', lib='/path/to/lib', upgrade='never')",
"check": "find.package('package', lib.loc='/path/to/lib', quiet=FALSE, verbose=TRUE)",
},
{
"name": "username/package/subdir@ref",
"name": "package",
"url": "username/package@ref",
"repo": "bitbucket",
"install": "remotes::install_bitbucket('{}', lib='/path/to/lib', upgrade='never')",
"check": "find.package('package', lib.loc='/path/to/lib', quiet=FALSE, verbose=TRUE)",
},
{
"name": "/path/to/package.tar.gz",
"name": "package",
"url": "/path/to/package.tar.gz",
"repo": "local",
"install": "remotes::install_local('{}', lib='/path/to/lib', upgrade='never')",
"install": "install.packages('{}', repos=NULL, lib='/path/to/lib', update=FALSE, ask=FALSE)",
"check": "find.package('package', lib.loc='/path/to/lib', quiet=FALSE, verbose=TRUE)",
},
]
for package in packages:
script = "; ".join(
[
package["install"].format(package["name"]),
".libPaths(c(.libPaths(), '/path/to/lib'))",
package["install"].format(package.get("url", package["name"])),
"status <- try({})".format(package["check"]),
"status <- ifelse(is(status, 'try-error'), 1, 0)",
"quit(save='no', status=status, runLast=FALSE)",
]
)
fp.register_subprocess(["R", "--vanilla", "-e", script], stdout="")
install_R_package(dest="/path/to/lib", name=package["name"], repo=package["repo"])
install_R_packages(dest="/path/to/lib", filename="/path/to/json")
Loading