GitHub - NYU-BFX/pipeline-atac2hic: Pipeline to predict HiC matrices from ATAC-seq fastq files using maxATAC and C.Origami

Introduction

Pipeline to predict HiC matrices from ATAC-seq fastq files using maxATAC and C.Origami

Setup

If you are working on Ultraviolet (aka BigPuprle) you need to setup an environment. Which means you need to install snakemake and mamba (so a conda within a conda).

module load anaconda3/gpu
conda create -n snakemake -c conda-forge -c bioconda mamba snakemake snakemake-executor-plugin-slurm

To configure conda to work on Ultraviolet you can edit the ultraviolet.yaml that comes with this repo to match your liking and copy it to your .config directory as shown.

mkdir -p ~/.config/snakemake/ultraviolet/
cp ultraviolet.yaml ~/.config/snakemake/ultraviolet/config.yaml

Execute

To run the pipeline you need to:

create a samplesheet with information about your samples
setup a directory with with data to run C.Origami
edit the config/config.yaml file to specify the paths to the relevant directories.

Samplesheet

The samplesheet should have information about sample and replicate names and the path to the fastq files. See the config/sample_meta.csv that comes with this directory for an example.

Alternatively you can specify a column called Run with the SRR ids of the samples and the pipeline should automatically download them for you (NOT TESTED).

C.Origami Directory

The C.Origami directory should look like this:

<corigami_base>/
├── data
│   ├── <genome>
│   │   ├── centrotelo.bed
│   │   └── dna_sequence
│   │       ├── chr10.fa.gz
│   │       ├── chr11.fa.gz
│   │       ├── ...
│   │       ├── chrX.fa.gz
│   │       └── chrY.fa.gz
│   └── <genome>_tiles.bed
└── model_weights
    └── <corigami_model>.ckpt

Where corigami_base, genome and corigami_model are specified in config/config.yaml.

The <corigami_base>/data directory can be:

Downloaded from here (you will need to untar it)
If you work from within Ultraviolet, symlinked from here: /gpfs/data/tsirigoslab/home/jt3545/hic_prediction/C.Origami-release/corigami_data/

To get the model weights you need to:

Train your own model and save the checkpoint. Ask Javier Rodriguez Hernaez for details.
Download a pretrained hg38 model checkpoint created by Javier from here. If in Ultraviolet symlink/copy the following path: /gpfs/home/rodrij92/PROJECTS/SHARE/epoch=53-step=64260.ckpt

config.yaml

The main parameter you may need to specify are:

genome: either hg38 or mm10
sample_meta: path to samplesheet
corigami_base: directory with C.Origami data
corigami_model: name of checkpoint file under <corigami_base>/model_weights

Run

If you have specify everything correctly you can launch the pipeline by executing the following commands on Ultraviolet:

conda activate snakemake  # activate environment you created in Ultraviolet if you don't have snakemake
snakemake --profile ultraviolet

Pipeline Graph

Predicting Simple Translocations

The repo has the script workflow/scripts/predict_translocation.py (still under development) that is not part of the pipeline, but can be used to predict the result of simple translocations. Simple translocations are defined as a merger of 2 chromosomes at a specific position (no indels or substitutions involved). Below is a schematic of all the simple translocations:

These translocations are defined in a VCF file like this:

#CHROM   POS    ID      REF   ALT            QUAL FILTER  INFO
2     321681    bnd_W    G    G]17:198982]    6    PASS    .
2     321682    bnd_V    T    ]13:123456]T    6    PASS    .
13    123456    bnd_U    C    C[2:321682[     6    PASS    .
13    123457    bnd_X    A    [17:198983[A    6    PASS    .
17    198982    bnd_Y    A    A]2:321681]     6    PASS    .
17    198983    bnd_Z    C    [13:123457[C    6    PASS    .

These specify translocations can be given as arguments to predict_translocation.py as follows (embed them in single quotes to stop bash from interpreting the symbols):

chr2:321681]chr17:198982]
]chr13:123456]chr2:321682
chr13:123456[chr2:321682[
[chr17:198983[chr13:123457
chr17:198982]chr2:321681]
[chr13:123457[chr17:198983

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
resources/corigami_data		resources/corigami_data
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dag.png		dag.png
ultraviolet.yaml		ultraviolet.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Setup

Execute

Samplesheet

C.Origami Directory

config.yaml

Run

Pipeline Graph

Predicting Simple Translocations

About

Releases

Packages

Contributors 2

Languages

License

NYU-BFX/pipeline-atac2hic

Folders and files

Latest commit

History

Repository files navigation

Introduction

Setup

Execute

Samplesheet

C.Origami Directory

config.yaml

Run

Pipeline Graph

Predicting Simple Translocations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages