Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for QC, tranches and optional EXIT-RIF gvcf dataset #92

Merged
merged 33 commits into from
Mar 29, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
951fde0
Add the notes for tranches
abhi18av Mar 21, 2022
ceb4ab1
Merge branch 'master' into develop
abhi18av Mar 21, 2022
aa6ed01
accommodate the optional reference exit rif gvcf dataset
abhi18av Mar 21, 2022
fe06d05
derive the location of GVCF file
abhi18av Mar 21, 2022
fdb959b
Allow users to run only the QC_CHECK workflow to confirm sample quality
abhi18av Mar 21, 2022
66cc26d
Update the readme to point to the website
abhi18av Mar 21, 2022
2ae9c00
update conda command and put the test workflow in the end
abhi18av Mar 21, 2022
f5a8da8
use lfs for exit-rif dataset
abhi18av Mar 21, 2022
641dfad
add exported and normal env files
abhi18av Mar 21, 2022
03fa87e
create a minimal conda env script
abhi18av Mar 21, 2022
bb9a61c
update the docker file
abhi18av Mar 21, 2022
34aa675
update the container version
abhi18av Mar 21, 2022
9b76c2a
updated the Dockerfiles
abhi18av Mar 21, 2022
ad02075
update the docker container version
abhi18av Mar 21, 2022
fc5ac6f
fix docker file for container2
abhi18av Mar 21, 2022
ddc8bff
fix conditional check
abhi18av Mar 21, 2022
6c807f2
add log to check execution path
abhi18av Mar 21, 2022
95d450e
adapt to the new quanttb_qc script
abhi18av Mar 21, 2022
4c0d292
fix typo in output channel
abhi18av Mar 21, 2022
bf85a14
Remove the extra view check
abhi18av Mar 21, 2022
82cd641
Remove the tbload library process
abhi18av Mar 21, 2022
b1b13c8
tweak the conda env files for general conda-forge channel
abhi18av Mar 21, 2022
73db15f
update the optimized config
abhi18av Mar 21, 2022
90bfc9a
source the exit-rif gvcf from HTTP
abhi18av Mar 21, 2022
9f4d47a
revert to project-local setups
abhi18av Mar 21, 2022
fb37a78
update for private repo
abhi18av Mar 21, 2022
f98b67c
update the description and readme
abhi18av Mar 21, 2022
c001859
Updated results_dirs
TimHHH Mar 28, 2022
7f62d81
Update variants_to_table.nf
TimHHH Mar 28, 2022
01af16e
Update haplotype_caller.nf
TimHHH Mar 28, 2022
ade591a
Update haplotype_caller__minor_variants.nf
TimHHH Mar 28, 2022
157e323
Update the parameters and mechanism to use optional input files (#94)
abhi18av Mar 28, 2022
e6c34f9
update the file pattern
biosharp-ou Mar 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
resources/exit_rif/EXIT-RIF.g.vcf.gz filter=lfs diff=lfs merge=lfs -text
resources/exit_rif/EXIT-RIF.g.vcf.gz.tbi filter=lfs diff=lfs merge=lfs -text
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,7 @@ results*
*.nextflow*

conda_envs/xbs-nf-env*
containers/**/*yml

samplesheet.csv
xbs-nf.sh
57 changes: 12 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,64 +1,31 @@
# XBS-nf

# Benefits of the Nextflow wrapper
XBS-nf (compleX Bacterial Samples) is a pipeline for comprehensive genomic analyses of Mycobacterium tuberculosis with a focus on clinical decision making as well as research.

# Salient features of the implementation

- Fine-grained control over resource allocation (CPU/Memory/Storage)
- Reliance of bioconda and biocontainers for installing packages for reproducibility
- Ease of use on a range of infrastructure (cloud/on-prem clusters/local machine)
- Reliance of bioconda for installing packages for reproducibility
- Ease of use on a range of infrastructure (cloud/on-prem HPC clusters/ servers (or local machines))
- Resumability for failed processes
- Centralized locations for specifying analysis parameters and hardware requirements
- XBS-nf parameters (`conf/global_parameters.config`)
- XBS-nf parameters (`default_parameters.config`)
- Hardware requirements (`conf/standard.config`)
- Software requirements (`conf/docker.config` or `conf/conda.config`)

# Quickstart for a server/laptop

**NOTE**: The instructions for a cluster system like SLURM/PBS are slightly different!

The simplest use case is to analyze a few genomes on a single machine environment. Almost all aspects are customizable but for the sake of brevity, a bare bones guide for any beginner user is as shown below

- [ ] Clone the project

```shell
git clone https://github.com/abhi18av/xbs-nf
cd xbs-nf
```

- [ ] Move your genomes (`fastq.gz files`) to a specific folder. For example `xbs-nf/data/full_data` folder

- [ ] Prepare a samplesheet using `xbs-nf/resources/reference_set/xbs-nf.test.csv` as a reference for the format.
- Execution (software) requirements (`conf/docker.config` or `conf/conda.config`)
- A GVCF reference dataset for ~600 samples

You can optionally put your sample samplsheet in `xbs-nf/resources/reference_set/` folder.
# Usage and Tutorial

- [ ] Update the `xbs-nf/conf/server.config` file to point to the reference sheet
TODO: For the usage and tutorials please refer the XBS-nf website

- [ ] To run the pipeline, make sure you have `conda` installed. Moreover, if you don't already have `nextflow` installed, you can use the following commands to install it

```shell
conda create -n xbs-nf-env -c bioconda -c conda-forge nextflow mamba openjdk=11
```


You can confirm the setup by activating that environment and using the `nextflow info` command

```
conda activate -n xbs-nf-env

nextflow info
```

- [ ] Then simply issue the following command on the command line

```
nextflow run main.nf -profile conda,server
```
# Citation

TODO: Update this section and add a citation.cff file

# Contributions

Contributions are warmly accepted!


# License

TODO
8 changes: 4 additions & 4 deletions conda_envs/setup_conda_envs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

set -xue

# NOTE: These environments must be in /path/to/xbs-nf/conda_envs folder
# NOTE: Please replace `conda` with `mamba` if it is installed for faster installs.

# NOTE: If there are problems in `mamba`, replace it with `conda`
# NOTE: The conda environments are expected by the `conda_local` profile to be created within `xbs-nf/conda_envs` directory

mamba create -p ./xbs-nf-env-1 bioconda::gatk4=4.2.4.1 conda-forge::R=4.1 conda-forge::r-ggplot2=3.3.5 bioconda::datamash=1.1.0 bioconda::delly=0.8.7 bioconda::lofreq=2.1.5 bioconda::delly=0.8.7 bioconda::lofreq=2.1.5 bioconda::tb-profiler=4.1.1 bioconda::multiqc=1.11 bioconda::fastqc=0.11.8
conda env create -p xbs-nf-env-1 --file xbs-nf-env-1.yml

mamba create -p ./xbs-nf-env-2 jemunro::quanttb=1.01 bioconda::bwa=0.7.17 bioconda::samtools=1.9 bioconda::iqtree=2.1.2 bioconda::snp-dists=0.8.2 bioconda::snp-sites=2.4.0 bioconda::bcftools=1.9 bioconda::snpeff=4.3.1t bioconda::clusterpicker=1.2.3
conda env create -p xbs-nf-env-2 --file xbs-nf-env-2.yml
16 changes: 16 additions & 0 deletions conda_envs/xbs-nf-env-1.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: xbs-nf-env-1
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::gatk4=4.2.4.1
- conda-forge::R=4.1
- conda-forge::r-ggplot2=3.3.5
- bioconda::datamash=1.1.0
- bioconda::delly=0.8.7
- bioconda::lofreq=2.1.5
- bioconda::delly=0.8.7
- bioconda::lofreq=2.1.5
- bioconda::tb-profiler=4.1.1
- bioconda::multiqc=1.11
- bioconda::fastqc=0.11.8
15 changes: 15 additions & 0 deletions conda_envs/xbs-nf-env-2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: xbs-nf-env-2
channels:
- conda-forge
- bioconda
dependencies:
- jemunro::quanttb=1.01
- bioconda::bwa=0.7.17
- bioconda::samtools=1.9
- bioconda::iqtree=2.1.2
- bioconda::snp-dists=0.8.2
- bioconda::snp-sites=2.4.0
- bioconda::bcftools=1.9
- bioconda::snpeff=4.3.1t
- bioconda::clusterpicker=1.2.3

4 changes: 2 additions & 2 deletions conf/docker.config
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ process {

withName:
'GATK.*|LOFREQ.*|DELLY.*|TBPROFILER.*|MULTIQC.*|FASTQC.*|UTILS.*' {
container = "rg.nl-ams.scw.cloud/xbs-nf-containers/xbs-nf-container-1:0.3.0"
container = "rg.nl-ams.scw.cloud/xbs-nf-containers/xbs-nf-container-1:0.5.0"
}

withName:
'QUANTTB.*|BWA.*|IQTREE.*|SNPDISTS.*|SNPSITES.*|BCFTOOLS.*|BGZIP.*|SAMTOOLS.*|SNPEFF.*|CLUSTERPICKER.*' {
container = "rg.nl-ams.scw.cloud/xbs-nf-containers/xbs-nf-container-2:0.3.0"
container = "rg.nl-ams.scw.cloud/xbs-nf-containers/xbs-nf-container-2:0.5.0"
}

/*
Expand Down
9 changes: 4 additions & 5 deletions conf/optimized_processes.config
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,6 @@ process {
memory = 2.GB
}

withName: 'TBPROFILER_LOAD_LIBRARY' {
cpus = 2
memory = 2.GB
}

withName: 'UTILS_QUANTTB_.*' {
cpus = 1
memory = 2.GB
Expand Down Expand Up @@ -98,4 +93,8 @@ process {
cpus = 1
memory = 2.GB
}

withName: 'TBPROFILER_VCF_PROFILE__LOFREQ.*' {
maxForks = 1
}
}
57 changes: 57 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
manifest {

name = "SLURM testing with 5 samples with optimization without EXIT-RIF GVCF"
}

params {
outdir = "${projectDir}/results"
optimize_variant_recalibration = false
compute_minor_variants = true
dataset_is_not_contaminated = true

use_ref_exit_rif_gvcf = false

//The path to resistance database to use for resistance calling
resistance_db = "NONE"


save_mode = 'symlink'


//NOTE: This is customized version for dev time testing (remove gaussian param)
GATK_VARIANT_RECALIBRATOR__SNP {
results_dir = "${params.outdir}/gatk/variant_recalibrator__snp"

arguments = " --use-allele-specific-annotations \
-AS \
--target-titv 1.7 \
--truth-sensitivity-tranche 100.0 \
--truth-sensitivity-tranche 99.9 \
--truth-sensitivity-tranche 99.8 \
--truth-sensitivity-tranche 99.7 \
--truth-sensitivity-tranche 99.6 \
--truth-sensitivity-tranche 99.5 \
--truth-sensitivity-tranche 99.4 \
--truth-sensitivity-tranche 99.3 \
--truth-sensitivity-tranche 99.2 \
--truth-sensitivity-tranche 99.1 \
--truth-sensitivity-tranche 99.0 \
--max-gaussians 1 \
-mq-cap 60"
}


}

executor {
// queueSize = 1
pollInterval = '5sec'
}

process {

executor = "slurm"
errorStrategy = { task.attempt < 3 ? 'retry' : 'ignore' }

time = '1h'
}
5 changes: 4 additions & 1 deletion containers/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@ set -uex

DOCKER_NAMESPACE="rg.nl-ams.scw.cloud/xbs-nf-containers"

cp ../conda_envs/xbs-nf-env-1.yml ./xbs-nf-container-1
cp ../conda_envs/xbs-nf-env-2.yml ./xbs-nf-container-2

for container_dir in $(find * -type d); do
echo "Building $container_dir ..."
cd $container_dir
CONTAINER_TAG=0.3.0
CONTAINER_TAG=0.5.0
CONTAINER_NAME=$DOCKER_NAMESPACE/$container_dir:$CONTAINER_TAG
echo "Container Name : $CONTAINER_NAME "
docker build -t $CONTAINER_NAME .
Expand Down
19 changes: 3 additions & 16 deletions containers/xbs-nf-container-1/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,22 +1,9 @@
FROM mambaorg/micromamba
MAINTAINER abhi18av@outlook.com

COPY --chown=$MAMBA_USER:$MAMBA_USER xbs-nf-env-1.yml /tmp/xbs-nf-env-1.yml

RUN \
micromamba install -y -n base -c conda-forge -c bioconda \
bioconda::gatk4=4.2.4.1 \
conda-forge::R=4.1 \
conda-forge::r-ggplot2=3.3.5 \
bioconda::datamash=1.1.0 \
bioconda::delly=0.8.7 \
bioconda::lofreq=2.1.5 \
bioconda::delly=0.8.7 \
bioconda::lofreq=2.1.5 \
bioconda::tb-profiler=4.1.1 \
bioconda::multiqc=1.11 \
bioconda::fastqc=0.11.8 \
&& micromamba clean -a -y

RUN micromamba install -y -f /tmp/xbs-nf-env-1.yml -n base

RUN micromamba install -y -n base conda-forge::procps-ng \
RUN micromamba install -y -n base conda-forge::procps-ng \
&& micromamba clean -a -y
16 changes: 3 additions & 13 deletions containers/xbs-nf-container-2/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,19 +1,9 @@
FROM mambaorg/micromamba
MAINTAINER abhi18av@outlook.com

COPY --chown=$MAMBA_USER:$MAMBA_USER xbs-nf-env-2.yml /tmp/xbs-nf-env-2.yml

RUN \
micromamba install -y -n base -c conda-forge -c bioconda \
jemunro::quanttb=1.01 \
bioconda::bwa=0.7.17 \
bioconda::samtools=1.9 \
bioconda::iqtree=2.1.2 \
bioconda::snp-dists=0.8.2 \
bioconda::snp-sites=2.4.0 \
bioconda::bcftools=1.9 \
bioconda::snpeff=4.3.1t \
bioconda::clusterpicker=1.2.3 \
&& micromamba clean -a -y
RUN micromamba install -y -f /tmp/xbs-nf-env-2.yml -n base

RUN micromamba install -y -n base conda-forge::procps-ng=3.3.16 conda-forge::bc=v1.07.1 \
RUN micromamba install -y -n base conda-forge::procps-ng \
&& micromamba clean -a -y
Loading