Synopsis

This repository contains the source code and input data used for the benchmarks for: hictk: blazing fast toolkit to work with .hic and .cool files (preprint available soon).

Input data download, subsequent analyses and benchmarks are automated using Nextflow and Singularity/Apptainer.

Docker images availability

Docker images are hosted on GHCR and can be found in the Packages page of this repository.

Images were generated using the build*dockerfile.yml GHA workflows using the Dockerfiles from the containers folder.

Nextflow workflows

Nextflow workflows under preprocessing/workflows and benchmarks/workflows were developed and tested using Nextflow v23.04.1, and should in principle work with any version supporting Nextflow DSL2.

Each workflow is paired with a config file (see preprocessing/configs and benchmarks/configs folders). As an example, preprocessing/workflows/fetch_data.nf is paired with config preprocessing/configs/fetch_data.config.

Requirements

Access to an internet connection (required to download input files and Docker images)
Nextflow v20.07.1 or newer
Apptainer/Singularity (tested with Apptainer v1.2.2)

Running workflows

Please make sure Nextflow is properly installed and configured before running any of the workflows.

The following workflows should be executed first, as they download and prepare files required by other workflows.

fetch_data.nf
preprocess_data.nf

The fetch_data.nf workflow requires internet access and can fail for various reason (e.g. connection reset by peer, service temporarily unavailable etc.). In case the workflow fails, wait few minutes, then relaunch the workflow.

cd preprocessing/

./run_fetch_data.sh
./run_preprocess_data.sh

The rest of the workflows can be run in any order:

cd benchmarks/

./run_benchmark_balance_16c.sh
./run_benchmark_balance.sh
./run_benchmark_cool2hic.sh
./run_benchmark_dump_chrom.sh
./run_benchmark_dump_gw.sh
./run_benchmark_fetch_and_sum.sh
./run_benchmark_hic2ool.sh
./run_benchmark_hicrep.sh
./run_benchmark_load.sh
./run_benchmark_zoomify.sh

Inside the common/configs folder there are the following base configs:

base_hovig.config
base_linux.config
base_saga.config
base_macos.config

These configs are passed to all workflows, and define available computation resources. You will most likely have to update one of the configs with resources available on our machine/cluster.

Feel free to use base_saga.config as starting point for a config to run benchmarks on an HPC cluster.

In order to avoid IO bottlenecks when running benchmarks for hictk's paper, input files used in benchmarks were stored on a ram-disk under /dev/shm/. In order to run benchmarks you have two options:

Create folder /dev/shm/2023-hictk-paper/ and copy the following files inside that folder:

ENCFF301CUL_cis.bedpe.gz
ENCFF301CUL_trans.bedpe.gz
ENCFF447ERX.hic8
ENCFF447ERX.hic9
ENCFF447ERX.mcool

Edit the config files and update variables hic_file, hic8_file, hic9_file and mcool_file to point to files stored under data/ (these files are created by the preprocessing workflows).

Example:

hic8_file = "${data_dir}/ENCFF447ERX.hic8"
hic9_file = "${data_dir}/ENCFF447ERX.hic9"
hic_file = "${data_dir}/ENCFF447ERX.hic9"
mcool_file = "${data_dir}/ENCFF447ERX.mcool"

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
.github		.github
benchmarks		benchmarks
bin		bin
cmake		cmake
common		common
containers		containers
data		data
external		external
preprocessing		preprocessing
utils		utils
.clang-format		.clang-format
.cmake-format.yaml		.cmake-format.yaml
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
conanfile.txt		conanfile.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synopsis

Docker images availability

Nextflow workflows

Requirements

Running workflows

About

Releases 3

Packages

Contributors 2

Languages

License

paulsengroup/2023-hictk-paper

Folders and files

Latest commit

History

Repository files navigation

Synopsis

Docker images availability

Nextflow workflows

Requirements

Running workflows

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages