Skip to content

madagiurgiu25/ecDNA-simulate-validate-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simulation pipeline

Simulation pipeline for ecDNA structures.
The pipeline generates:

  • ecDNA template .fasta based on a bed file definition
  • reads based on the ecDNA template .fastq

With the raw data is performed:

  • assembly
  • evaluation of the assembly
  • mapped, sv, cnv calling

Initialize index and dictionary

If you clone this repo, index and dictionary are already included. If you use different genome or change location of genome please run initialize rule first

snakemake --use-conda -s rules/genome_index.smk 

Run entire pipeline using snakemake per sample

snakemake --cores 8 \
--use-conda \
--configfile configs/config.yaml \
--config input=data/raw/AnBC.bed name=AnBC outputdir=data/process/AnBC

Simulate reads for ecDNA templates using slurm

For single ecDNA template:

#!/bin/bash
#
#SBATCH --job-name=snakemake_main_job
#SBATCH --ntasks=32
#SBATCH --nodes=1
#SBATCH --time=14-00:00:00
#SBATCH --mem-per-cpu=50G
#SBATCH --output=slurm_logs/%x-%j.log

mkdir -p slurm_logs
export SBATCH_DEFAULTS=" --output=slurm_logs/%x-%j.log"

date

snakemake --cores 32 \
--use-conda \
--configfile configs/config.yaml \
--config input=data/raw/AnBC.bed name=AnBC outputdir=data/process/AnBC runmode=simulate-mapping-sv

date

For multiple templates at once:

#!/bin/bash
#
#SBATCH --job-name=snakemake_main_job
#SBATCH --ntasks=32
#SBATCH --nodes=1
#SBATCH --time=14-00:00:00
#SBATCH --mem-per-cpu=50G
#SBATCH --output=slurm_logs/%x-%j.log

mkdir -p slurm_logs
export SBATCH_DEFAULTS=" --output=slurm_logs/%x-%j.log"

date

snakemake --cores 32 \
--use-conda \
--configfile configs/config.yaml \
--config batch=<dir_batch>  outputdir=data/process/<batch_number> runmode=simulate-mapping-sv

date

Argument batch=<dir_batch> is set to run the pipeline for multiple templates at once, located under <dir_batch>.

Useful commands

Plot pipeline dag:

snakemake --dag | dot -Tpdf > dag.pdf
snakemake --rulegraph | dot -Tpdf > dag_simplified.pdf

Install Quast

conda install quast

# GRIDSS (needed for structural variants detection)
quast-download-gridss

# install gene annotation
quast-download-busco

Generate kmers for winnowmap2

Precompute the high frequency k-mers for the different assemblies.

meryl count k=15 output merylDB ref.fa
meryl print greater-than distinct=0.9998 merylDB > repetitive_k15.txt

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published