-
Notifications
You must be signed in to change notification settings - Fork 33
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: complete, reproducible example workflow
- Loading branch information
Showing
18 changed files
with
614 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,4 @@ resources/** | |
logs/** | ||
.snakemake | ||
.snakemake/** | ||
.test/results/* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
samplesheet: "config/samples.tsv" | ||
|
||
get_genome: | ||
database: "ncbi" | ||
assembly: "GCF_000006785.2" | ||
fasta: Null | ||
gff: Null | ||
gff_source_type: | ||
[ | ||
"RefSeq": "gene", | ||
"RefSeq": "pseudogene", | ||
"RefSeq": "CDS", | ||
"Protein Homology": "CDS", | ||
] | ||
|
||
simulate_reads: | ||
read_length: 100 | ||
read_number: 100000 | ||
random_freq: 0.01 | ||
|
||
cutadapt: | ||
threep_adapter: "-a ATCGTAGATCGG" | ||
fivep_adapter: "-A GATGGCGATAGG" | ||
default: ["-q 10 ", "-m 25 ", "-M 100", "--overlap=5"] | ||
|
||
multiqc: | ||
config: "config/multiqc_config.yml" | ||
|
||
report: | ||
export_figures: True | ||
export_dir: "figures/" | ||
figure_width: 875 | ||
figure_height: 500 | ||
figure_resolution: 125 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
remove_sections: | ||
- samtools-stats |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
sample condition replicate read1 read2 | ||
sample1 wild_type 1 sample1.bwa.read1.fastq.gz sample1.bwa.read2.fastq.gz | ||
sample2 wild_type 2 sample2.bwa.read1.fastq.gz sample2.bwa.read2.fastq.gz |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,21 +1,109 @@ | ||
# Snakemake workflow: `<name>` | ||
|
||
[![Snakemake](https://img.shields.io/badge/snakemake-≥6.3.0-brightgreen.svg)](https://snakemake.github.io) | ||
[![GitHub actions status](https://github.com/<owner>/<repo>/workflows/Tests/badge.svg?branch=main)](https://github.com/<owner>/<repo>/actions?query=branch%3Amain+workflow%3ATests) | ||
|
||
[![Snakemake](https://img.shields.io/badge/snakemake-≥8.0.0-brightgreen.svg)](https://snakemake.github.io) | ||
[![GitHub actions status](https://github.com/MPUSP/snakemake-workflow-template/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/MPUSP/snakemake-workflow-template/actions/workflows/main.yml) | ||
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) | ||
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1D355C.svg?labelColor=000000)](https://sylabs.io/docs/) | ||
[![workflow catalog](https://img.shields.io/badge/Snakemake%20workflow%20catalog-darkgreen)](https://snakemake.github.io/snakemake-workflow-catalog) | ||
|
||
A Snakemake workflow for `<description>` | ||
|
||
- [Snakemake workflow: `<name>`](#snakemake-workflow-name) | ||
- [Usage](#usage) | ||
- [Workflow overview](#workflow-overview) | ||
- [Running the workflow](#running-the-workflow) | ||
- [Input data](#input-data) | ||
- [Execution](#execution) | ||
- [Parameters](#parameters) | ||
- [Authors](#authors) | ||
- [References](#references) | ||
- [TODO](#todo) | ||
|
||
## Usage | ||
|
||
The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=<owner>%2F<repo>). | ||
|
||
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) <repo>sitory and its DOI (see above). | ||
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository or its DOI. | ||
|
||
## Workflow overview | ||
|
||
This workflow is a best-practice workflow for `<detailed description>`. | ||
The workflow is built using [snakemake](https://snakemake.readthedocs.io/en/stable/) and consists of the following steps: | ||
|
||
1. Parse sample sheet containing sample meta data (`python`) | ||
2. Simulate short read sequencing data on the fly (`dwgsim`) | ||
3. Check quality of input read data (`FastQC`) | ||
4. Trim adapters from input data (`cutadapt`) | ||
5. Collect statistics from tool output (`MultiQC`) | ||
|
||
## Running the workflow | ||
|
||
### Input data | ||
|
||
This template workflow contains artifical sequencing data in `*.fastq.gz` format. | ||
The test data is located in `.test/data`. Input files are supplied with a mandatory table, whose location is indicated in the `config.yml` file (default: `.test/samples.tsv`). The sample sheet has the following layout: | ||
|
||
| sample | condition | replicate | data_folder | fq1 | | ||
| -------- | --------- | --------- | ----------- | ------------------------ | | ||
| RPF-RTP1 | RPF-RTP | 1 | data | RPF-RTP1_R1_001.fastq.gz | | ||
| RPF-RTP2 | RPF-RTP | 2 | data | RPF-RTP2_R1_001.fastq.gz | | ||
|
||
### Execution | ||
|
||
To run the workflow from command line, change the working directory. | ||
|
||
```bash | ||
cd path/to/snakemake-workflow-name | ||
``` | ||
|
||
Adjust options in the default config file `config/config.yml`. | ||
Before running the entire workflow, you can perform a dry run using: | ||
|
||
```bash | ||
snakemake --dry-run | ||
``` | ||
|
||
To run the complete workflow with test files using **conda**, execute the following command. The definition of the number of compute cores is mandatory. | ||
|
||
```bash | ||
snakemake --cores 10 --sdm conda --directory .test | ||
``` | ||
|
||
To run the workflow with **singularity** / **apptainer**, use: | ||
|
||
```bash | ||
snakemake --cores 10 --sdm conda apptainer --directory .test | ||
``` | ||
|
||
### Parameters | ||
|
||
This table lists all parameters that can be used to run the workflow. | ||
|
||
| parameter | type | details | default | | ||
| ---------------------- | ---- | ------------------------------------------- | -------------------------------------------- | | ||
| **samplesheet** | | | | | ||
| path | str | path to samplesheet, mandatory | "config/samples.tsv" | | ||
| **cutadapt** | | | | | ||
| fivep_adapter | str | sequence of the 5' adapter | Null | | ||
| threep_adapter | str | sequence of the 3' adapter | `ATCGTAGATCGGAAGAGCACACGTCTGAA` | | ||
| default | str | additional options passed to `cutadapt` | [`-q 10 `, `-m 22 `, `-M 52`, `--overlap=3`] | | ||
|
||
## Authors | ||
|
||
- Firstname Lastname | ||
- Affiliation | ||
- ORCID profile | ||
- home page | ||
|
||
## References | ||
|
||
> Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. *Sustainable data analysis with Snakemake*. F1000Research, 10:33, 10, 33, **2021**. https://doi.org/10.12688/f1000research.29032.2. | ||
# TODO | ||
## TODO | ||
|
||
* Replace `<owner>` and `<repo>` everywhere in the template (also under .github/workflows) with the correct `<repo>` name and owning user or organization. | ||
* Replace `<name>` with the workflow name (can be the same as `<repo>`). | ||
* Replace `<description>` with a description of what the workflow does. | ||
* Update the workflow description, parameters, running options, authors and references in the `README.md` | ||
* Update the `README.md` badges. Add or remove badges for `conda`/`singularity`/`apptainer` usage depending on the workflow's capability | ||
* The workflow will occur in the snakemake-workflow-catalog once it has been made public. Then the link under "Usage" will point to the usage instructions if `<owner>` and `<repo>` were correctly set. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
samplesheet: ".test/config/samples.tsv" | ||
|
||
get_genome: | ||
database: "ncbi" | ||
assembly: "GCF_000006785.2" | ||
fasta: Null | ||
gff: Null | ||
gff_source_type: | ||
[ | ||
"RefSeq": "gene", | ||
"RefSeq": "pseudogene", | ||
"RefSeq": "CDS", | ||
"Protein Homology": "CDS", | ||
] | ||
|
||
simulate_reads: | ||
read_length: 100 | ||
read_number: 100000 | ||
random_freq: 0.01 | ||
|
||
cutadapt: | ||
threep_adapter: "-a ATCGTAGATCGG" | ||
fivep_adapter: "-A GATGGCGATAGG" | ||
default: ["-q 10 ", "-m 25 ", "-M 100", "--overlap=5"] | ||
|
||
multiqc: | ||
config: "config/multiqc_config.yml" | ||
|
||
report: | ||
export_figures: True | ||
export_dir: "figures/" | ||
figure_width: 875 | ||
figure_height: 500 | ||
figure_resolution: 125 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
remove_sections: | ||
- samtools-stats |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
$schema: "http://json-schema.org/draft-07/schema#" | ||
description: an entry in the sample sheet | ||
properties: | ||
samplesheet: | ||
type: string | ||
description: sample name/identifier | ||
|
||
get_genome: | ||
properties: | ||
database: | ||
type: ["string", "null"] | ||
assembly: | ||
type: ["string", "null"] | ||
fasta: | ||
type: ["string", "null"] | ||
gff: | ||
type: ["string", "null"] | ||
gff_source_type: | ||
type: array | ||
|
||
simulate_reads: | ||
properties: | ||
read_length: | ||
type: number | ||
read_number: | ||
type: number | ||
random_freq: | ||
type: number | ||
|
||
cutadapt: | ||
properties: | ||
threep_adapter: | ||
type: string | ||
fivep_adapter: | ||
type: string | ||
default: | ||
type: array | ||
|
||
multiqc: | ||
properties: | ||
config: | ||
type: string | ||
|
||
required: ["samplesheet", "get_genome", "simulate_reads", "cutadapt", "multiqc"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
$schema: "http://json-schema.org/draft-07/schema#" | ||
description: an entry in the sample sheet | ||
properties: | ||
sample: | ||
type: string | ||
description: sample name/identifier | ||
condition: | ||
type: string | ||
description: sample condition that will be compared during differential analysis | ||
replicate: | ||
type: number | ||
default: 1 | ||
description: consecutive numbers representing multiple replicates of one condition | ||
read1: | ||
type: string | ||
description: names of fastq.gz files, read 1 | ||
read2: | ||
type: string | ||
description: names of fastq.gz files, read 2 (optional) | ||
|
||
required: | ||
- sample | ||
- condition | ||
- replicate | ||
- read1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,45 @@ | ||
# Main entrypoint of the workflow. | ||
# Please follow the best practices: | ||
# Main entrypoint of the workflow. | ||
# Please follow the best practices: | ||
# https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html, | ||
# in particular regarding the standardized folder structure mentioned there. | ||
# in particular regarding the standardized folder structure mentioned there. | ||
|
||
|
||
import os | ||
import pandas as pd | ||
|
||
|
||
# load configuration | ||
# ----------------------------------------------------- | ||
configfile: "config/config.yml" | ||
|
||
|
||
# container definition: uncomment to include a singularity image, e.g. from github's container registry | ||
# container: "oras://ghcr.io/<user>/<repository>:<version>" | ||
|
||
|
||
# load rules | ||
# ----------------------------------------------------- | ||
include: "rules/common.smk" | ||
include: "rules/process_reads.smk" | ||
|
||
|
||
# optional messages, log and error handling | ||
# ----------------------------------------------------- | ||
onstart: | ||
print("\n--- Analysis started ---\n") | ||
|
||
|
||
onsuccess: | ||
print("--- Workflow finished! ---") | ||
|
||
|
||
onerror: | ||
print("--- An error occurred! ---") | ||
|
||
|
||
# target rules | ||
# ----------------------------------------------------- | ||
rule all: | ||
input: | ||
"results/multiqc/multiqc_report.html", | ||
default_target: True |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
name: cutadapt | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
dependencies: | ||
- cutadapt=4.9 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
name: fastqc | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
dependencies: | ||
- fastqc=0.12.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
name: get_genome | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
dependencies: | ||
- unzip=6.0 | ||
- ncbi-datasets-cli=16.23.0 | ||
- bcbio-gff=0.7.1 | ||
- samtools=1.20 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
name: multiqc | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
dependencies: | ||
- python=3.9 | ||
- multiqc=1.14 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
name: get_genome | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
dependencies: | ||
- dwgsim=1.1.14 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# import basic packages | ||
import pandas as pd | ||
from snakemake.utils import validate | ||
from os import path | ||
|
||
|
||
# read sample sheet | ||
samples = ( | ||
pd.read_csv(config["samplesheet"], sep="\t", dtype={"sample": str}) | ||
.set_index("sample", drop=False) | ||
.sort_index() | ||
) | ||
|
||
|
||
# validate sample sheet and config file | ||
validate(samples, schema="../../config/schemas/samples.schema.yml") | ||
validate(config, schema="../../config/schemas/config.schema.yml") |
Oops, something went wrong.