-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: update sequenza #601
base: main
Are you sure you want to change the base?
fix: update sequenza #601
Conversation
…on, and removed github
…ithub and bitbucket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good, what is now left:
- Moving the packages description to the wrapper side of snappy, and
- Creating a model for
sequenza.extract
&sequenza.fit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, the definition of the package(s) should be put near the wrappers, not in the Snakefile.
I suggest to create a R_environment.json
file in the snappy_wrappers/wrappers/sequenza/install
folder, such as:
[
{
"name": "Runuran",
"repository": "cran"
},
{
"name": "sequenza",
"repository": "bitbucket",
"url": "sequenzatools/sequenza@07116cc"
}
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would re-write the function, making use of the R_environment.json
information. As the package name is provided by the user, there is much simplification to be gained, for example:
def install_R_package(
dest: str, name: str, repository: str = "cran", url: str | None = None
) -> subprocess.CompletedProcess:
assert dest, "Missing R package destination folder"
os.makedirs(os.path.dirname(dest), mode=0o750, exist_ok=True)
match repository:
case "cran":
if not url:
url = "https://cloud.r-project.org"
install_cmd = f"install.packages('{name}', lib='{dest}', repos='{url}', update=FALSE, ask=FALSE)"
case "bioconductor":
install_cmd = f"BiocManager::install('{name}', lib='{dest}', update=FALSE, ask=FALSE)"
case "github":
assert url, f"Can't install R package '{name}' from github, URL is missing"
install_cmd = f"remotes::install_github('{url}', lib='{dest}', upgrade='never')"
case "bitbucket":
assert url, f"Can't install R package '{name}' from bitbucket, URL is missing"
install_cmd = f"remotes::install_bitbucket('{url}', lib='{dest}', upgrade='never')"
case "local":
assert url, f"Can't install local R package '{name}', missing path"
assert os.path.exists(url), f"Can't find local R package '{name}' at location '{url}'"
install_cmd = f"install.packages('{url}', repos=NULL, lib='{dest}', update=FALSE, ask=FALSE)"
case _:
raise ValueError("Unknown repository '{repository}'"
R_script = [
f".libPaths(c(.libPaths(), '{dest}'))",
install_cmd,
f"status <- try(find.package('{name}', lib.loc='{dest}', quiet=FALSE, verbose=TRUE))",
"status <- ifelse(is(status, 'try-error'), 1, 0)",
"quit(save='no', status=status, runLast=FALSE)",
]
cmd = ["R", "--vanilla", "-e", "; ".join(R_script)]
return subprocess.run(cmd, text=True, check=True)
(Please check my code, I haven't tested it...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I would perhaps add another function, for example install_R_packages
, which would read the json file & loop over all packages for installation. For example:
import json
def install_R_packages(dest: str, filename: str):
with open(filename, "rt") as f:
packages = json.load(filename)
for package in packages:
status = install_R_package(dest, name=package["name"], repository=package["repository"], url=package.get("url", None))
status.check_returncode()
Then, the wrapper snappy_wrappers/wrappers/sequenza/install/wrapper.py
might just read:
# -*- coding: utf-8 -*-
"""Installation of sequenza non-standard packages"""
import os
import sys
# The following is required for being able to import snappy_wrappers modules
# inside wrappers. These run in an "inner" snakemake process which uses its
# own conda environment which cannot see the snappy_pipeline installation.
base_dir = os.path.abspath(os.path.dirname(__file__))
while os.path.basename(base_dir) != "snappy_wrappers":
base_dir = os.path.dirname(base_dir)
sys.path.insert(0, os.path.dirname(base_dir))
from snappy_wrappers.utils import install_R_packages
__author__ = "Eric Blanc <eric.blanc@bih-charite.de>"
dest = os.path.dirname(str(snakemake.output.done))
install_R_packages(dest, os.path.join(os.path.dirname(__file__), "R_environment.json"))
These are just suggestions. I believe that they could facilitate maintaining the R packages, but perhaps there are unforeseen problems with them, or you have a simpler solution (for example, I don't like json file, I would prefer yaml, but the latter would require adding to the environment.yaml
some yaml python library, which I think will only make the environment more difficult to maintain).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the arguments of R function sequenza.extract
& sequenza.fit
, & build a model from them.
The man pages for the functions can be viewed from R using:
library(sequenza)
?sequenza.extract
?sequenza.fit
But I have a suspicion that the man pages are not yet updated for the new code.
Typing: sequenza.extract
will list the function's complete code. You can check the argument list, and verify if some of them should not be included in the list (verbosity, input file(s) selected by the pipeline, plots size & format, ...)
@DivyaratanPopli we have a pre-commit config which runs formatting (on python and snakemake files) on each commit automatically, see https://github.com/bihealth/snappy-pipeline/blob/main/docs/installation.rst#installing-pre-commit-hooks which can be very convenient ;) |
Another possibility, though I am not sure how well this works with wrappers: |
Modified the environment, wrapper and Snakefile for cnv calling step to be able to install and run a new version of sequenza (@07116cc).