Simulation pipeline

Simulation pipeline for ecDNA structures.
The pipeline generates:

ecDNA template .fasta based on a bed file definition
reads based on the ecDNA template .fastq

With the raw data is performed:

assembly
evaluation of the assembly
mapped, sv, cnv calling

Initialize index and dictionary

If you clone this repo, index and dictionary are already included. If you use different genome or change location of genome please run initialize rule first

snakemake --use-conda -s rules/genome_index.smk

Run entire pipeline using snakemake per sample

snakemake --cores 8 \
--use-conda \
--configfile configs/config.yaml \
--config input=data/raw/AnBC.bed name=AnBC outputdir=data/process/AnBC

Simulate reads for ecDNA templates using slurm

For single ecDNA template:

#!/bin/bash
#
#SBATCH --job-name=snakemake_main_job
#SBATCH --ntasks=32
#SBATCH --nodes=1
#SBATCH --time=14-00:00:00
#SBATCH --mem-per-cpu=50G
#SBATCH --output=slurm_logs/%x-%j.log

mkdir -p slurm_logs
export SBATCH_DEFAULTS=" --output=slurm_logs/%x-%j.log"

date

snakemake --cores 32 \
--use-conda \
--configfile configs/config.yaml \
--config input=data/raw/AnBC.bed name=AnBC outputdir=data/process/AnBC runmode=simulate-mapping-sv

date

For multiple templates at once:

#!/bin/bash
#
#SBATCH --job-name=snakemake_main_job
#SBATCH --ntasks=32
#SBATCH --nodes=1
#SBATCH --time=14-00:00:00
#SBATCH --mem-per-cpu=50G
#SBATCH --output=slurm_logs/%x-%j.log

mkdir -p slurm_logs
export SBATCH_DEFAULTS=" --output=slurm_logs/%x-%j.log"

date

snakemake --cores 32 \
--use-conda \
--configfile configs/config.yaml \
--config batch=<dir_batch>  outputdir=data/process/<batch_number> runmode=simulate-mapping-sv

date

Argument batch=<dir_batch> is set to run the pipeline for multiple templates at once, located under <dir_batch>.

Useful commands

Plot pipeline dag:

snakemake --dag | dot -Tpdf > dag.pdf
snakemake --rulegraph | dot -Tpdf > dag_simplified.pdf

Install Quast

conda install quast

# GRIDSS (needed for structural variants detection)
quast-download-gridss

# install gene annotation
quast-download-busco

Generate kmers for winnowmap2

Precompute the high frequency k-mers for the different assemblies.

meryl count k=15 output merylDB ref.fa
meryl print greater-than distinct=0.9998 merylDB > repetitive_k15.txt

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
envs		envs
rules		rules
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
dag.pdf		dag.pdf
dag_simplified.pdf		dag_simplified.pdf
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simulation pipeline

Initialize index and dictionary

Run entire pipeline using snakemake per sample

Simulate reads for ecDNA templates using slurm

Useful commands

Install Quast

Generate kmers for winnowmap2

About

Releases

Packages

Languages

License

madagiurgiu25/ecDNA-simulate-validate-pipeline

Folders and files

Latest commit

History

Repository files navigation

Simulation pipeline

Initialize index and dictionary

Run entire pipeline using snakemake per sample

Simulate reads for ecDNA templates using slurm

Useful commands

Install Quast

Generate kmers for winnowmap2

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages