Skip to content

Commit

Permalink
Merge branch 'dev' into nf-core-template-merge-3.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
JudithBernett authored Jan 29, 2025
2 parents b28687f + 3f276ec commit f24c762
Show file tree
Hide file tree
Showing 64 changed files with 2,724 additions and 142 deletions.
3 changes: 2 additions & 1 deletion .github/workflows/template_version_comment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@ name: nf-core template version comment
# This workflow is triggered on PRs to check if the pipeline template version matches the latest nf-core version.
# It posts a comment to the PR, even if it comes from a fork.

on: pull_request_target
on:
pull_request:

jobs:
template_version:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ testing/
testing*
*.pyc
null/
.idea/
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,17 @@ Initial release of nf-core/drugresponseeval, created with the [nf-core](https://

### `Added`

- Updated to the new template
- Added tests that run with docker, singularity, apptainer, and conda
- Added the docker container and the conda env.yml in the nextflow.config. We just need one container for all
processes as this pipeline automates the PyPI package drevalpy.
- Added usage and output documentation.

### `Fixed`

- Fixed linting issues
- Fixed bugs with path_data: can now be handled as absolute and relative paths

### `Dependencies`

### `Deprecated`
24 changes: 24 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# nf-core/drugresponseeval: Citations

## [DrugResponseEval](https://github.com/nf-core/drugresponseeval/)

> Bernett, J, Iversen, P, Picciani, M, Wilhelm, M, Baum, K, List, M. Will be published soon.
## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.
Expand All @@ -10,6 +14,26 @@
## Pipeline tools

- [DrEvalPy](https://github.com/daisybio/drevalpy): The pipeline mostly automates the individual steps of the DrEvalPy PyPI package.

> Bernett, J, Iversen, P, Picciani, M, Wilhelm, M, Baum, K, List, M. Will be published soon.
- [DIPK](https://doi.org/10.1093/bib/bbae153): Implemented model in the pipeline.

> Li P, Jiang Z, Liu T, Liu X, Qiao H, Yao X. Improving drug response prediction via integrating gene relationships with deep learning. Briefings in Bioinformatics. 2024 May;25(3):bbae153.
- [MOLI](https://doi.org/10.1093/bioinformatics/btz318): Implemented model in the pipeline.

> Sharifi-Noghabi H, Zolotareva O, Collins CC, Ester M. MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics. 2019 Jul;35(14):i501-9.
- [SRMF](https://doi.org/10.1186/s12885-017-3500-5): Implemented model in the pipeline.

> Wang L, Li X, Zhang L, Gao Q. Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization. BMC cancer. 2017 Dec;17:1-2.
- [SuperFELT](https://doi.org/10.1186/s12859-021-04146-z): Implemented model in the pipeline.

> Park S, Soh J, Lee H. Super. FELT: supervised feature extraction learning using triplet loss for drug response prediction with multi-omics data. BMC bioinformatics. 2021 May 25;22(1):269.
## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
76 changes: 41 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

[![GitHub Actions CI Status](https://github.com/nf-core/drugresponseeval/actions/workflows/ci.yml/badge.svg)](https://github.com/nf-core/drugresponseeval/actions/workflows/ci.yml)
[![GitHub Actions Linting Status](https://github.com/nf-core/drugresponseeval/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/drugresponseeval/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/drugresponseeval/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)

[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/)
Expand All @@ -15,52 +16,52 @@
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/drugresponseeval)

[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23drugresponseeval-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/drugresponseeval)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)

## Introduction
[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)
[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)
[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)

**nf-core/drugresponseeval** is a bioinformatics pipeline that ...
# ![drevalpy_summary](assets/drevalpy-2-qr.svg)

<!-- TODO nf-core:
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
major pipeline sections and the types of output it produces. You're giving an overview to someone new
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
-->
## Introduction

<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
**DrEval** is a bioinformatics framework which includes a PyPI package (drevalpy) and a Nextflow
pipeline (this repo). DrEval ensures that evaluations are statistically sound, biologically
meaningful, and reproducible. DrEval simplifies the implementation of drug response prediction
models, allowing researchers to focus on advancing their modeling innovations by automating
standardized evaluation protocols and preprocessing workflows. With DrEval, hyperparameter
tuning is fair and consistent. With its flexible model interface, DrEval supports any model type,
ranging from statistical models to complex neural networks. By contributing your model to the
DrEval catalog, you can increase your work's exposure, reusability, and transferability.

# ![Pipeline diagram showing the major steps of nf-core/drugresponseeval](assets/drugresponseeval_pipeline_simplified.png)

1. The response data is loaded
2. All models are trained and evaluated in a cross-validation setting
3. For each CV split, the best hyperparameters are determined using a grid search per model
4. The model is trained on the full training set (train & validation) with the best
hyperparameters to predict the test set
5. If randomization tests are enabled, the model is trained on the full training set with the best
hyperparameters to predict the randomized test set
6. If robustness tests are enabled, the model is trained N times on the full training set with the
best hyperparameters
7. Plots are created summarizing the results

For baseline models, no randomization or robustness tests are performed.

## Usage

> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
Explain what rows and columns represent. For instance (please edit as appropriate):
First, prepare a samplesheet with your input data that looks as follows:
`samplesheet.csv`:
```csv
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
```
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
-->

Now, you can run the pipeline using:

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->

```bash
nextflow run nf-core/drugresponseeval \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>
--models <model1,model2,...> \
--baselines <baseline1,baseline2,...> \
--dataset_name <dataset_name> \
--path_data <path_data> \
```

> [!WARNING]
Expand All @@ -76,14 +77,19 @@ For more details about the output files and reports, please refer to the

## Credits

nf-core/drugresponseeval was originally written by Judith Bernett.
nf-core/drugresponseeval was originally written by Judith Bernett (TUM) and Pascal Iversen (FU
Berlin).

We thank the following people for their extensive assistance in the development of this pipeline:

<!-- TODO nf-core: If applicable, make list of people who have also contributed -->

## Contributions and Support

Contributors to nf-core/drugresponseeval and the drevalpy PyPI package:

- [Judith Bernett](https://github.com/JudithBernett) (TUM)
- [Pascal Iversen](https://github.com/PascalIversen) (FU Berlin)
- [Mario Picciani](https://github.com/picciama) (TUM)

If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).

For further information or help, don't hesitate to get in touch on the [Slack `#drugresponseeval` channel](https://nfcore.slack.com/channels/drugresponseeval) (you can join with [this invite](https://nf-co.re/join/slack)).
Expand Down
1 change: 1 addition & 0 deletions assets/drevalpy-2-qr.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/drugresponseeval_pipeline_simplified.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 13 additions & 0 deletions bin/check_params.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env python
import sys
from drevalpy.utils import get_parser, check_arguments


def main(argv=None):
"""Coordinate argument parsing and program execution."""
args = get_parser().parse_args(argv)
check_arguments(args)


if __name__ == "__main__":
sys.exit(main())
59 changes: 59 additions & 0 deletions bin/collect_results.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/usr/bin/env python
import argparse
import pandas as pd

from drevalpy.visualization.utils import prep_results, write_results


def get_parser():
parser = argparse.ArgumentParser(description="Collect results and write to single files.")
parser.add_argument("--outfiles", type=str, nargs="+", required=True, help="Output files.")
return parser


def parse_results(args):
# get all files with the pattern f'{model_name}_evaluation_results.csv' from args.outfiles
result_files = [file for file in args.outfiles if "evaluation_results.csv" in file]
# get all files with the pattern f'{model_name}_evaluation_results_per_drug.csv' from args.outfiles
result_per_drug_files = [file for file in args.outfiles if "evaluation_results_per_drug.csv" in file]
# get all files with the pattern f'{model_name}_evaluation_results_per_cl.csv' from args.outfiles
result_per_cl_files = [file for file in args.outfiles if "evaluation_results_per_cl.csv" in file]
# get all files with the pattern f'{model_name}_true_vs_pred.csv' from args.outfiles
t_vs_pred_files = [file for file in args.outfiles if "true_vs_pred.csv" in file]
return result_files, result_per_drug_files, result_per_cl_files, t_vs_pred_files


def collapse_file(files):
out_df = None
for file in files:
if out_df is None:
out_df = pd.read_csv(file, index_col=0)
else:
out_df = pd.concat([out_df, pd.read_csv(file, index_col=0)])
return out_df


if __name__ == "__main__":
args = get_parser().parse_args()
# parse the results from args.outfiles
eval_result_files, eval_result_per_drug_files, eval_result_per_cl_files, true_vs_pred_files = parse_results(args)

# collapse the results into single dataframes
eval_results = collapse_file(eval_result_files)
eval_results_per_drug = collapse_file(eval_result_per_drug_files)
eval_results_per_cell_line = collapse_file(eval_result_per_cl_files)
t_vs_p = collapse_file(true_vs_pred_files)

# prepare the results through introducing new columns algorithm, rand_setting, LPO_LCO_LDO, split, CV_split
eval_results, eval_results_per_drug, eval_results_per_cell_line, t_vs_p = prep_results(
eval_results, eval_results_per_drug, eval_results_per_cell_line, t_vs_p
)

# save the results to csv files
write_results(
path_out="",
eval_results=eval_results,
eval_results_per_drug=eval_results_per_drug,
eval_results_per_cl=eval_results_per_cell_line,
t_vs_p=t_vs_p,
)
50 changes: 50 additions & 0 deletions bin/consolidate_results.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/usr/bin/env python

import os
import argparse
from drevalpy.models import MODEL_FACTORY
from drevalpy.experiment import consolidate_single_drug_model_predictions


def get_parser():
parser = argparse.ArgumentParser(description="Consolidate results for SingleDrugModels")
parser.add_argument('--run_id', type=str, required=True, help="Run ID")
parser.add_argument("--test_mode", type=str, required=True, help="Test mode (LPO, LCO, LDO)")
parser.add_argument("--model_name", type=str, required=True, help="All Model "
"names")
parser.add_argument("--outdir_path", type=str, required=True, help="Output directory path")
parser.add_argument("--n_cv_splits", type=int, required=True, help="Number of CV splits")
parser.add_argument("--cross_study_datasets", type=str, nargs="+", help="All "
"cross-study "
"datasets")
parser.add_argument("--randomization_modes", type=str, required=True, help="All "
"randomizations")
parser.add_argument("--n_trials_robustness", type=int, required=True, help="Number of trials")
return parser


def main():
parser = get_parser()
args = parser.parse_args()
results_path = os.path.join(
args.outdir_path,
args.run_id,
args.test_mode,
)
randomizations = args.randomization_modes.split('[')[1].split(']')[0].split(', ')
model = MODEL_FACTORY[args.model_name]
if args.cross_study_datasets is None:
args.cross_study_datasets = []
consolidate_single_drug_model_predictions(
models=[model],
n_cv_splits=args.n_cv_splits,
results_path=results_path,
cross_study_datasets=args.cross_study_datasets,
randomization_mode=randomizations,
n_trials_robustness=args.n_trials_robustness,
out_path=""
)


if __name__ == "__main__":
main()
36 changes: 36 additions & 0 deletions bin/cv_split.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/usr/bin/env python

import argparse
import pickle
import sys


def get_parser():
parser = argparse.ArgumentParser(description="Split data into CV splits")
parser.add_argument("--response", type=str, required=True, help="Path to response data")
parser.add_argument("--n_cv_splits", type=int, required=True, help="Number of CV splits")
parser.add_argument("--test_mode", type=str, default="LPO", help="Test mode (LPO, LCO, LDO)")
return parser


def main():
parser = get_parser()
args = parser.parse_args()
response_data = pickle.load(open(args.response, "rb"))
response_data.remove_nan_responses()
response_data.split_dataset(
n_cv_splits=args.n_cv_splits,
mode=args.test_mode,
split_validation=True,
split_early_stopping=True,
validation_ratio=0.1,
random_state=42,
)
for split_index, split in enumerate(response_data.cv_splits):
with open(f"split_{split_index}.pkl", "wb") as f:
pickle.dump(split, f)


if __name__ == "__main__":
main()
sys.exit(0)
29 changes: 29 additions & 0 deletions bin/draw_cd.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/usr/bin/env python
import argparse
import pandas as pd

from drevalpy.visualization.critical_difference_plot import CriticalDifferencePlot

def get_parser():
parser = argparse.ArgumentParser(description="Draw critical difference plots.")
parser.add_argument("--name", type=str, required=True, help="Name/Setting of plot.")
parser.add_argument("--data", type=str, required=True, help="Path to data.")
return parser


def draw_cd(path_to_df: str, setting: str):
df = pd.read_csv(path_to_df, index_col=0)
df = df[(df["LPO_LCO_LDO"] == setting) & (df["rand_setting"] == "predictions")]
cd_plot = CriticalDifferencePlot(
eval_results_preds=df,
metric='MSE'
)
cd_plot.draw_and_save(
out_prefix='',
out_suffix=setting
)


if __name__ == "__main__":
args = get_parser().parse_args()
draw_cd(path_to_df=args.data, setting=args.name)
Loading

0 comments on commit f24c762

Please sign in to comment.