From dc02eb76cfa53fef1bd876276ca5709b81b62889 Mon Sep 17 00:00:00 2001
From: Roberto Rossini <71787608+robomics@users.noreply.github.com>
Date: Sun, 1 Oct 2023 19:41:14 +0200
Subject: [PATCH] Update README and docs (#71)
---
README.md | 506 +++------------------------------
docs/cli_reference.rst | 6 +-
docs/generate_cli_reference.sh | 6 +-
3 files changed, 42 insertions(+), 476 deletions(-)
diff --git a/README.md b/README.md
index c50385d1..e26fc1f4 100644
--- a/README.md
+++ b/README.md
@@ -7,6 +7,7 @@ SPDX-License-Identifier: MIT
# hictk
[![License](https://img.shields.io/badge/license-MIT-green)](./LICENSE)
+[![docs](https://readthedocs.org/projects/hictk/badge/?version=latest)](https://hictk.readthedocs.io/en/latest/?badge=latest)
[![Ubuntu CI](https://github.com/paulsengroup/hictk/actions/workflows/ubuntu-ci.yml/badge.svg)](https://github.com/paulsengroup/hictk/actions/workflows/ubuntu-ci.yml)
[![MacOS CI](https://github.com/paulsengroup/hictk/actions/workflows/macos-ci.yml/badge.svg)](https://github.com/paulsengroup/hictk/actions/workflows/macos-ci.yml)
[![Windows CI](https://github.com/paulsengroup/hictk/actions/workflows/windows-ci.yml/badge.svg)](https://github.com/paulsengroup/hictk/actions/workflows/windows-ci.yml)
@@ -14,7 +15,8 @@ SPDX-License-Identifier: MIT
[![Fuzzy testing](https://github.com/paulsengroup/hictk/actions/workflows/fuzzy-testing.yml/badge.svg)](https://github.com/paulsengroup/hictk/actions/workflows/fuzzy-testing.yml)
[![Download from Bioconda](https://img.shields.io/conda/vn/bioconda/hictk?label=bioconda&logo=Anaconda)](https://anaconda.org/bioconda/hictk)
-
+[![Zenodo DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.8214220.svg)](https://doi.org/10.5281/zenodo.8214220)
+
---
hictk is a blazing fast toolkit to work with .hic and .cool files.
@@ -29,493 +31,49 @@ hictk is capable of reading files in `.cool`, `.mcool`, `.scool` and `.hic` form
hictk is developed on Linux and tested on Linux, MacOS and Windows.
-hictk can be installed or compiled with one of the following methods.
-
-### Docker or Singularity/Apptainer
-
-First, make sure you follow the instructions on how to install Docker or Singularity/Apptainer on your OS.
-
-
-Installing Docker
-
-The following instructions assume you have root/admin permissions.
-
-- [Linux](https://docs.docker.com/desktop/install/linux-install/#generic-installation-steps/)
-- [MacOS](https://docs.docker.com/desktop/install/mac-install/)
-- [Windows](https://docs.docker.com/desktop/install/windows-install/)
-
-On some Linux distributions just installing Docker is not enough.
-You also need to start (and optionally enable) the appropriate service(s).
-This is usually done with one of the following:
-
-```bash
-sudo systemctl start docker
-sudo systemctl start docker.service
-```
-
-Refer to [Docker](https://docs.docker.com/engine/install/) or your distribution documentation for more details.
-
-
-
-
-Installing Singularity/Apptainer
-
-The following instructions assume you have root/admin permissions.
-
-Apptainer can be easily installed using your system package manager.
-
-[Here](https://apptainer.org/docs/admin/main/installation.html#install-from-pre-built-packages) you can find instructions for common Linux distributions such as Ubuntu.
+hictk can be installed using containers, bioconda or directly from source. Refer to [Installation](https://hictk.readthedocs.io/en/latest/installation.html) for more information.
-Even if your distribution is not listed in the above documentation, your system package manager likely includes a package for Singularity or Apptainer. If this is not the case, then you must install Apptainer from source (instructions available [here](https://github.com/apptainer/apptainer/blob/release-1.1/INSTALL.md)).
-
-
-
-#### Pulling hictk Docker image
-
-hictk Docker images are available on [ghcr.io](https://github.com/paulsengroup/hictk/pkgs/container/hictk)
-and [dockerhub](https://hub.docker.com/repository/docker/paulsengroup/hictk).
-
-Downloading and running the latest stable release can be done as follows:
-
-```console
-# Using Docker, may require sudo
-user@dev:/tmp$ docker run ghcr.io/paulsengroup/hictk:0.0.2 --help
-
-# Using Singularity/Apptainer
-user@dev:/tmp$ singularity run ghcr.io/paulsengroup/hictk:0.0.2 --help
-
-Blazing fast tools to work with .hic and .cool files.
-Usage: /usr/local/bin/hictk [OPTIONS] SUBCOMMAND
-
-Options:
- -h,--help Print this help message and exit
- -V,--version Display program version information and exit
-
-Subcommands:
- convert Convert HiC matrices to a different format.
- dump Dump data from .hic and Cooler files to stdout.
- load Build .cool files from interactions in various text formats.
- merge Merge coolers.
- validate Validate .hic and Cooler files.
- zoomify Convert single-resolution Cooler file to multi-resolution by coarsening.
-```
-
-The above will print hictk's help message, and is equivalent to running `hictk --help` on the command line (assuming hictk is available on your machine).
+## Running hictk
+hictk provides the following subcommands:
-### Conda (bioconda)
+| subcommand | description |
+|----------------|----------------------------------------------------------------------------------------|
+| __balance__ | Balance HiC matrices using ICE. |
+| __convert__ | Convert matrices between .hic and Cooler formats. |
+| __dump__ | Write interactions from .hic or Cooler files to the terminal. |
+| __fix-mcool__ | Fix corrupted .mcool files. |
+| __load__ | Generate a Cooler file from pixels or pairs of interactions in text format. |
+| __merge__ | Merge multiple Cooler files using the same reference assembly. |
+| __validate__ | Validate Cooler and .hic files. |
+| __zoomify__ | Convert single-resolution cooler files to multi-resolution cooler files by coarsening. |
-hictk package for Linux, MacOS and Windows is available on [bioconda](https://anaconda.org/bioconda/hictk) and can be installed as follows:
+Refer to [Quickstart (CLI)](https://hictk.readthedocs.io/en/latest/quickstart_cli.html) and [CLI Reference](https://hictk.readthedocs.io/en/latest/cli_reference.html) for more details.
-```console
-user@dev:/tmp$ conda create -n hictk -c conda-forge -c bioconda hictk
+## Using libhictk
-(hictk) user@dev:/tmp$ conda activate hictk
+libhictk can be installed in various way, including with Conan and CMake FetchContent. Section [Quickstart (API)](https://hictk.readthedocs.io/en/latest/quickstart_api.html) of hictk documentation contains further details on how this can be accomplished.
-(hictk) user@dev:/tmp$ whereis hictk
-hictk: /home/user/.miniconda3/envs/hictk/bin/hictk
+[Quickstart (API)](https://hictk.readthedocs.io/en/latest/quickstart_api.html) also showcases the basic functionality offered by libhictk. For more complex examples refer to the sample programs under the [examples/](./examples/) folder as well as to the [source code](./src/hictk/) of hictk.
-(hictk) user@dev:/tmp$ hictk --version
-hictk-v0.0.2-bioconda
-```
+The public C++ API of hictk is documented in the [C++ API Reference](https://hictk.readthedocs.io/en/latest/cpp_api/index.html) section of hictk documentation.
-### Installing from source
+## Citing
+If you use hictk in you reaserch, please cite the following publication:
-hictk can be compiled on most UNIX-like systems, including many Linux distributions, MacOS (10.15+) and Windows.
+Preprint available soon.
-Build instructions
-
-## Build instructions
-
-Instructions assume hictk is being built on a UNIX environment.
-
-Building on Windows follows the same logic but some of the commands may be slightly different.
-
-### Build requirements
-
-Compiling hictk requires a compiler toolchain supporting C++17, such as:
-
-- GCC 8+
-- Clang 8+
-- Apple-Clang 10.0+
-
-Furthermore, the following tools are required:
-- CMake 3.25+
-- Conan 2+
-- git 2.7+
-- make or ninja
-- Python3.6+ (including `pip`, required to install Conan)
+BibTex
-
-We recommend installing CMake and Conan in a Python [virtualenv](https://virtualenvwrapper.readthedocs.io/en/stable/), but you are of course free to install the build dependencies in any way you want.
-
-```bash
-python3 -m venv /tmp/venv
-/tmp/venv/bin/python3 -m pip install pip setuptools --upgrade
-/tmp/venv/bin/python3 -m pip install 'cmake>=3.25' 'conan>=2' ninja
-
-# NOTE: It's important to activate the venv after installing CMake
-. /tmp/venv/bin/activate
-
-whereis cmake # cmake: /tmp/venv/bin/cmake
-whereis conan # conan: /tmp/venv/bin/conan
-whereis ninja # ninja: /tmp/venv/bin/ninja
-
-cmake --version
-conan --version
-
-# Detect compiler toolchain. It is usually a good idea to explicitly set CC and CXX
-CC=gcc CXX=g++ conan profile detect --force
-```
-
-#### Getting the source code
-
-Download from the [Release](https://github.com/paulsengroup/hictk/releases) page (recommended).
-```bash
-mkdir /tmp/hictk
-curl -L 'https://github.com/paulsengroup/hictk/archive/refs/tags/v0.0.2.tar.gz' | tar --strip-components=1 -C /tmp/hictk -xzf -
-```
-
-Using git.
-```bash
-git clone https://github.com/paulsengroup/hictk.git /tmp/hictk
-
-cd /tmp/hictk
-git checkout v0.0.2 # Skip this step if you want to build the latest commit from main
-```
-
-#### Compiling hictk
-
-```bash
-# Activate venv
-. /tmp/venv/bin/activate
-
-# Set these variables to the number of CPU cores available on your machine
-# You can check this with e.g.
-# python -c 'import multiprocessing as mp; print(mp.cpu_count())')
-export CONAN_CPU_COUNT=8
-export CMAKE_BUILD_PARALLEL_LEVEL=8
-
-# Install/build dependencies with Conan
-conan install --build=missing \
- -pr default \
- -s build_type=Release \
- -s compiler.cppstd=17 \
- --output-folder=./build/ \
- .
-
-# This may take a while, as CMake will run Conan to build hictk dependencies.
-# Do not pass -G Ninja if you want CMake to use make instead of ninja
-cmake -DCMAKE_BUILD_TYPE=Release \
- -DCMAKE_PREFIX_PATH="$PWD/build" \
- -DHICTK_ENABLE_TESTING=ON \
- -DHICTK_BUILD_TOOLS=ON \
- -G Ninja \
- -S /tmp/hictk \
- -B /tmp/hictk/build
-
-cmake --build /tmp/hictk/build
+```bibtex
+@misc{hictk,
+author = {Roberto Rossini},
+year = {2023},
+note = {https://github.com/paulsengroup/hictk},
+title = {hictk: blazing fast toolkit to work with .hic and .cool files}
+}
```
-To override the default compiler used by CMake, pass the following arguments to the first CMake command: `-DCMAKE_C_COMPILER=path/to/cc -DCMAKE_CXX_COMPILER=path/to/c++`
-
-We highly recommend using the same compiler when running Conan and CMake.
-
-## Running automated tests
-
-The steps outlined in this section are optional but highly recommended.
-
-#### Unit tests
-
-```bash
-# Activate venv
-. /tmp/venv/bin/activate
-
-cd /tmp/hictk
-ctest --test-dir build/ \
- --schedule-random \
- --output-on-failure \
- --no-tests=error \
- --timeout 120 \
- -j8 # Change this to the number of available CPU cores
-```
-
-A successful run of the test suite will produce an output like the following:
-```console
-user@dev:/tmp/hictk$ ctest --test-dir build/ ...
-...
-63/70 Test #21: Cooler: init files - SHORT ....................................... Passed 0.02 sec
-64/70 Test #57: HiC: pixel selector fetch (observed NONE BP 10000) - LONG ........ Passed 1.53 sec
-65/70 Test #5: Cooler: index validation - SHORT ................................. Passed 3.83 sec
-66/70 Test #17: Cooler: index validation - SHORT ................................. Passed 3.62 sec
-67/70 Test #37: Cooler: utils merge - LONG ....................................... Passed 4.35 sec
-68/70 Test #67: Transformers (cooler) - SHORT .................................... Passed 4.11 sec
-69/70 Test #36: Cooler: dataset random iteration - MEDIUM ........................ Passed 5.50 sec
-70/70 Test #40: Cooler: dataset large read/write - LONG .......................... Passed 11.47 sec
-
-100% tests passed, 0 tests failed out of 70
-
-Total Test time (real) = 12.03 sec
-```
-
-__All tests are expected to pass. Do not ignore test failures!__
-
-
- Troubleshooting test failures
-If one or more tests fail, try the following troubleshooting steps before reaching out for help.
-
-1. Make sure you are running `ctest` from the root of the source tree (`/tmp/hictk` if you are following the instructions).
-2. Make sure you are passing the correct build folder to `--test-dir`. Pass the absolute path if necessary (i.e. `--test-dir=/tmp/hictk/build/` if you are following the instructions).
-3. Re-run `ctest` with `-j1`. This can be necessary on machines with very little memory (e.g. less than 2GB).
-4. Before running `ctest`, create a temporary folder where your user has read-write permissions and where there are at least 100-200MB of space available.
- Then set variable `TMPDIR` to that folder and re-run `ctest`.
-5. Checksum the test dataset located under `test/data/` by running `sha256sum -c checksums.sha256`.
- If the checksumming fails or the folder doesn't exist, download and extract the `.tar.xz` file listed in file `cmake/FetchTestDataset.cmake`. Make sure you run `tar -xf` from the root of the repository (`/tmp/hictk` if you are following the instructions).
-
-Example:
-```bash
-# Activate venv
-. /tmp/venv/bin/activate
-
-cd /tmp/hictk
-
-# Make sure this is the URL listed in file cmake/FetchTestDataset.cmake
-curl -L 'https://zenodo.org/record/8143316/files/hictk_test_data.tar.xz?download=1' | tar -xJf -
-
-# This should print "OK" if the check is successful
-(cd test/data && sha256sum --quiet -c checksums.sha256 && 2>&1 echo OK)
-
-mkdir ~/hictk-test-dir # Remember to delete this folder
-
-TMPDIR="$HOME/hictk-test-dir" \
-ctest --test-dir=/tmp/hictk/build/ \
- --schedule-random \
- --output-on-failure \
- --no-tests=error \
- --timeout 600 \
- -j1
-
-# rm -r ~/hictk-test-dir
-```
-
-If after trying the above steps the tests are still failing, feel free to start [discussion](https://github.com/paulsengroup/hictk/discussions) asking for help.
-
-
-
-
-#### Integration tests
-
-The integration test scripts depend on the following tools:
-
-- cooler>=0.9
-- java
-- [juicer_tools](https://github.com/aidenlab/Juicebox/releases/latest) or [hic_tools](https://github.com/aidenlab/HiCTools/releases/latest)
-- xz
-- common UNIX shell commands
-
-cooler can be installed using pip:
-```bash
-/tmp/venv/bin/pip3 install 'cooler>=0.9'
-```
-
-juicer_tools and hic_tools do not need to be installed, downloading the JAR file is enough:
-```bash
-curl -L 'https://github.com/aidenlab/HiCTools/releases/download/v3.30.00/hic_tools.3.30.00.jar' -o /tmp/hictk/hic_tools.jar
-```
-
-If not already installed, `xz` can usually be installed with your system package manager (on some Linux distributions the relevant package is called `xz-utils`).
-
-```bash
-# Activate venv
-. /tmp/venv/bin/activate
-
-cd /tmp/hictk
-
-# hictk convert
-test/scripts/hictk_convert_cool2hic.sh build/src/hictk/hictk juicer_tools.jar
-test/scripts/hictk_convert_hic2cool.sh build/src/hictk/hictk
-
-# hictk dump
-test/scripts/hictk_dump_balanced.sh build/src/hictk/hictk
-test/scripts/hictk_dump_bins.sh build/src/hictk/hictk
-test/scripts/hictk_dump_chroms.sh build/src/hictk/hictk
-test/scripts/hictk_dump_cis.sh build/src/hictk/hictk
-test/scripts/hictk_dump_gw.sh build/src/hictk/hictk
-test/scripts/hictk_dump_trans.sh build/src/hictk/hictk
-
-# hictk load (sorted)
-test/scripts/hictk_load_4dn.sh build/src/hictk/hictk sorted
-test/scripts/hictk_load_bg2.sh build/src/hictk/hictk sorted
-test/scripts/hictk_load_coo.sh build/src/hictk/hictk sorted
-
-# hictk load (unsorted)
-test/scripts/hictk_load_4dn.sh build/src/hictk/hictk unsorted
-test/scripts/hictk_load_bg2.sh build/src/hictk/hictk unsorted
-test/scripts/hictk_load_coo.sh build/src/hictk/hictk unsorted
-
-# hictk merge
-test/scripts/hictk_merge.sh build/src/hictk/hictk
-
-# hictk validate
-test/scripts/hictk_validate.sh build/src/hictk/hictk
-
-# hictk zoomify
-test/scripts/hictk_zoomify.sh build/src/hictk/hictk
-```
-
-## Installation
-
-Once all tests have passed, `hictk` can be installed as follows:
-
-```console
-# Activate venv
-user@dev:/tmp$ . /tmp/venv/bin/activate
-
-# Install system-wide (requires root/admin rights)
-user@dev:/tmp$ cmake --install /tmp/hictk/build
--- Install configuration: "Release"
--- Installing: /usr/local/bin/hictk
--- Set runtime path of "/usr/local/bin/hictk" to ""
--- Up-to-date: /usr/local/share/licenses/hictk/LICENSE
-...
-
-# Alternatively, install to custom path
-user@dev:/tmp$ cmake --install /tmp/hictk/build --prefix "$HOME/.local/"
--- Install configuration: "Release"
--- Installing: /home/user/.local/bin/hictk
--- Set runtime path of "/home/user/.local/bin/hictk" to ""
--- Up-to-date: /home/user/.local/share/licenses/hictk/LICENSE
-...
-```
-
-## Cleaning build artifacts
-
-After successfully compiling hictk the following folders safely be removed:
-- Python virtualenv: `/tmp/venv`
-- hictk source tree: `/tmp/hictk`
-
-If you are not using Conan in any other project feel free to also delete Conan's folder `~/.conan2/`
-
-
-
-### Running hictk
-
-hictk provides the following subcommands:
-
-| subcommand | description |
-|--------------|----------------------------------------------------------------------------------------|
-| __convert__ | Convert matrices between .hic and Cooler formats. |
-| __dump__ | Write interactions from .hic or Cooler files to the terminal. |
-| __load__ | Generate a Cooler file from pixels or pairs of interactions in text format. |
-| __merge__ | Merge multiple Cooler files using the same reference assembly. |
-| __validate__ | Validate Cooler and .hic files. |
-| __zoomify__ | Convert single-resolution cooler files to multi-resolution cooler files by coarsening. |
-
-#### Examples
-
-Converting hic->cooler:
-
-```bash
-# Create a .mcool file using all resolutions available in interactions.hic
-hictk convert interactions.hic interactions.mcool
-
-# Create a .cool file at 10kb resolution
-hictk convert interactions.hic interactions.cool --resolutions 10000
-
-# Create a .mcool file using a subset of the resolutions available in interactions.hic
-hictk convert interactions.hic interactions.mcool --resolutions 10000 20000 50000
-```
-
-Converting cool->hic:
-
-```bash
-# Create a .hic file using interactions.cool as base resolution
-hictk convert interactions.cool interactions.hic --juicer-tools-jar /tmp/hic_tools.jar
-
-# Create a .hic file with the resolutions found in interactions.mcool
-hictk convert interactions.mcool interactions.cool --juicer-tools-jar /tmp/hic_tools.jar
-```
-
-Dumping interactions:
-
-```shell
-user@dev:/tmp$ hictk dump interactions.cool
-0 0 1745
-0 1 2844
-0 2 409
-...
-
-user@dev:/tmp$ hictk dump interactions.cool --join
-chr2L 0 10000 chr2L 0 10000 1745
-chr2L 0 10000 chr2L 10000 20000 2844
-chr2L 0 10000 chr2L 20000 30000 409
-...
-
-user@dev:/tmp$ hictk dump interactions.mcool::/resolutions/10000 --join
-chr2L 0 10000 chr2L 0 10000 1745
-chr2L 0 10000 chr2L 10000 20000 2844
-chr2L 0 10000 chr2L 20000 30000 409
-...
-
-user@dev:/tmp$ hictk dump interactions.hic --join --resolution 10000 --matrix-type expected
-chr2L 0 10000 chr2L 0 10000 2351.23291015625
-chr2L 0 10000 chr2L 10000 20000 1447.001708984375
-chr2L 0 10000 chr2L 20000 30000 613.9473876953125
-...
-
-user@dev:/tmp$ hictk dump interactions.hic --join --resolution 10000 --normalization VC
-chr2L 0 10000 chr2L 0 10000 3575.918701171875
-chr2L 0 10000 chr2L 10000 20000 2654.79052734375
-chr2L 0 10000 chr2L 20000 30000 387.9197082519531
-...
-
-user@dev:/tmp$ hictk dump interactions.hic --join --resolution 10000 --range chr3L:20,000,000-25,000,000
-chr3L 20000000 20010000 chr3L 20000000 20010000 5400
-chr3L 20000000 20010000 chr3L 20010000 20020000 3766
-chr3L 20000000 20010000 chr3L 20020000 20030000 2015
-
-user@dev:/tmp$ hictk dump interactions.hic --join --resolution 10000 --range chr3L:20,000,000-25,000,000 --range2 chrX
-chr3L 20000000 20010000 chrX 50000 60000 2
-chr3L 20000000 20010000 chrX 140000 150000 1
-chr3L 20000000 20010000 chrX 150000 160000 1
-...
-```
-
-Loading interactions in a Cooler file:
-
-```bash
-# Create a 10kbp .cool file using hg38 as reference
-hictk load --format 4dn --assembly hg38 hg38.chrom.sizes 10000 out.cool < interactions.txt
-
-# Same as above but using gzip-compressed interactions
-zcat interactions.txt.gz | hictk load --format 4dn --assembly hg38 hg38.chrom.sizes 10000 out.cool
-
-# Using interactions in bedgraph2 format (see --help for the list of supported formats)
-hictk load --format bg2 --assembly hg38 hg38.chrom.sizes 10000 out.cool < interactions.txt
-```
-
-Merging multiple coolers:
-
-```bash
-hictk merge interactions1.cool interactions2.cool -o merged.cool
-```
-
-Checking file integrity (especially useful to detect corrupted .mcool from 4DNucleome, see [here](https://github.com/robomics/20221129_4dnucleome_bug_report) and [here](https://github.com/open2c/cooler/issues/319)):
-
-```bash
-hictk validate interactions.cool --validate-index
-
-hictk validate interactions.hic
-```
-
-Creating .mcool files from .cool files:
-
-```bash
-hictk zoomify interactions.cool interactions.mcool --resolutions 1000 5000 10000 ...
-
-# Coarsen a single resolution
-hictk zoomify interactions.cool interactions.ccool --no-copy-base-resolution --resolutions 10000
-```
diff --git a/docs/cli_reference.rst b/docs/cli_reference.rst
index 1c3a1f7a..58235475 100644
--- a/docs/cli_reference.rst
+++ b/docs/cli_reference.rst
@@ -5,8 +5,12 @@
CLI Reference
#############
-.. code-block text
+For an up-to-date list of subcommands and CLI options refer to hictk --help.
+Subcommands
+-----------
+
+.. code-block:: text
Blazing fast tools to work with .hic and .cool files.
Usage: hictk [OPTIONS] SUBCOMMAND
diff --git a/docs/generate_cli_reference.sh b/docs/generate_cli_reference.sh
index 7814cce9..cd83a473 100755
--- a/docs/generate_cli_reference.sh
+++ b/docs/generate_cli_reference.sh
@@ -34,8 +34,12 @@ cat << EOT
CLI Reference
#############
-.. code-block text
+For an up-to-date list of subcommands and CLI options refer to ``hictk --help``.
+Subcommands
+-----------
+
+.. code-block:: text
EOT