Skip to content

Commit

Permalink
Add docs (#69)
Browse files Browse the repository at this point in the history
  • Loading branch information
robomics authored Oct 1, 2023
1 parent 0ec9edc commit 6e3a0b2
Show file tree
Hide file tree
Showing 41 changed files with 3,092 additions and 12 deletions.
2 changes: 2 additions & 0 deletions .github/.codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,6 @@
# SPDX-License-Identifier: MIT

ignore:
- "benchmarks/"
- "examples/"
- "test/"
4 changes: 2 additions & 2 deletions .github/workflows/codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -201,13 +201,13 @@ jobs:
test/scripts/hictk_dump_normalizations.sh build/src/hictk/hictk
test/scripts/hictk_dump_cells.sh build/src/hictk/hictk
test/scripts/hictk_fix_mcool.sh build/src/hictk/hictk
test/scripts/hictk_dump_gw.sh build/src/hictk/hictk
test/scripts/hictk_dump_cis.sh build/src/hictk/hictk
test/scripts/hictk_dump_trans.sh build/src/hictk/hictk
test/scripts/hictk_dump_balanced.sh build/src/hictk/hictk
test/scripts/hictk_fix_mcool.sh build/src/hictk/hictk
test/scripts/hictk_load_coo.sh build/src/hictk/hictk sorted
test/scripts/hictk_load_coo.sh build/src/hictk/hictk unsorted
test/scripts/hictk_load_bg2.sh build/src/hictk/hictk sorted
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ benchmark/data
build/
cmake-build*/
conan-envs/
docs/_build/
external/
scratch/
test/data
Expand Down
22 changes: 22 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright (C) 2023 Roberto Rossini <roberros@uio.no>
#
# SPDX-License-Identifier: MIT

version: 2

build:
os: ubuntu-22.04
apt_packages:
- librsvg2-bin
tools:
python: "3.11"

sphinx:
configuration: docs/conf.py

python:
install:
- requirements: docs/requirements.txt

formats:
- pdf
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ endif()

if(HICTK_BUILD_EXAMPLES)
message(STATUS "Building examples.")
# add_subdirectory(examples)
add_subdirectory(examples)
endif()

if(HICTK_BUILD_BENCHMARKS)
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ Furthermore, the following tools are required:
- Python3.6+ (including `pip`, required to install Conan)


We recommend to install CMake and Conan in a Python [virtualenv](https://virtualenvwrapper.readthedocs.io/en/stable/), but you are of course free to install the build dependencies in any way you want.
We recommend installing CMake and Conan in a Python [virtualenv](https://virtualenvwrapper.readthedocs.io/en/stable/), but you are of course free to install the build dependencies in any way you want.

```bash
python3 -m venv /tmp/venv
Expand Down Expand Up @@ -225,7 +225,7 @@ We highly recommend using the same compiler when running Conan and CMake.

## Running automated tests

Steps outlined in this section are optional but highly recommended.
The steps outlined in this section are optional but highly recommended.

#### Unit tests

Expand Down Expand Up @@ -264,7 +264,7 @@ __All tests are expected to pass. Do not ignore test failures!__

<details>
<summary> Troubleshooting test failures </summary>
If one or more test fail, try the following troubleshooting steps before reaching out for help.
If one or more tests fail, try the following troubleshooting steps before reaching out for help.

1. Make sure you are running `ctest` from the root of the source tree (`/tmp/hictk` if you are following the instructions).
2. Make sure you are passing the correct build folder to `--test-dir`. Pass the absolute path if necessary (i.e. `--test-dir=/tmp/hictk/build/` if you are following the instructions).
Expand Down
21 changes: 21 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Copyright (C) 2023 Roberto Rossini <roberros@uio.no>
#
# SPDX-License-Identifier: MIT

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Binary file added docs/assets/4dnucleome_bug_notice.avif
Binary file not shown.
12 changes: 12 additions & 0 deletions docs/assets/corrupted_mcool_example.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
chrom1 start1 end1 chrom2 start2 end2 count balanced
chr1 10828000 10830000 chr1 11002000 11004000 1 0.000208987
chr1 10828000 10830000 chr1 11002000 11004000 1 0.000208987
chr1 10828000 10830000 chr1 11006000 11008000 1 0.000199523
chr1 10828000 10830000 chr1 11006000 11008000 3 0.000598569
chr1 10828000 10830000 chr1 11010000 11012000 4 0.000695946
chr1 10828000 10830000 chr1 11010000 11012000 2 0.000347973
chr1 10828000 10830000 chr1 11020000 11022000 1 0.000219669
chr1 10828000 10830000 chr1 11020000 11022000 1 0.000219669
chr1 10828000 10830000 chr1 11030000 11032000 3 0.000499071
chr1 10828000 10830000 chr1 11030000 11032000 2 0.000332714
... ... ... ... ... ... ... ...
50 changes: 50 additions & 0 deletions docs/balancing_matrices.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
..
Copyright (C) 2023 Roberto Rossini <roberros@uio.no>
SPDX-License-Identifier: MIT
Balancing Hi-C matrices
#######################

``hictk`` supports balancing .hic, .cool and .mcool files using ICE (iterative correction and eigenvector decomposition).

.. code-block:: console
user@dev:/tmp$ hictk balance 4DNFIZ1ZVXC8.mcool::/resolutions/1000
[2023-10-01 13:18:02.119] [info]: Running hictk v0.0.2-f83f93e
[2023-10-01 13:18:02.130] [info]: Writing interactions to temporary file /tmp/4DNFIZ1ZVXC8.tmp0...
[2023-10-01 13:18:05.098] [info]: Initializing bias vector...
[2023-10-01 13:18:05.099] [info]: Masking rows with fewer than 10 nnz entries...
[2023-10-01 13:18:06.298] [info]: Masking rows using mad_max=5...
[2023-10-01 13:18:06.971] [info]: Iteration 1: 36874560.192587376
[2023-10-01 13:18:07.634] [info]: Iteration 2: 21347543.04950776
[2023-10-01 13:18:08.307] [info]: Iteration 3: 7819314.542541969
...
[2023-10-01 13:19:20.365] [info]: Iteration 105: 2.1397932757529552e-05
[2023-10-01 13:19:21.146] [info]: Iteration 106: 1.6604770462001875e-05
[2023-10-01 13:19:21.870] [info]: Iteration 107: 1.2885285040054778e-05
[2023-10-01 13:19:22.608] [info]: Iteration 108: 9.99900768769869e-06
[2023-10-01 13:19:22.619] [info]: Writing weights to 4DNFIZ1ZVXC8.mcool::/resolutions/1000/bins/weight...
When balancing files in .mcool or .hic formats, all resolutions are balanced.

By default balancing coefficients are stored in the input file under the name of "weight".

This can be changed by passing the desired name through the ``--name`` option.

``hictk`` supports three balancing methods:

* Using all (genome-wide) interactions (default)
* Using trans interactions only
* Using cis interactions only

Balancing method can be changed through the ``--mode`` option (e.g. ``--mode=gw`` or ``--mode=cis``).

When enough memory is available, ``hictk`` can be instructed to load all interactions into system memory by passing the ``--in-memory`` flag. This can dramatically speed up matrix balancing at the cost of potentially much higher memory usage (approximately 1 GB of RAM for every 40M interactions).

Another way to improve performance is to increase the number of threads available for computation using the ``--thread`` option.
It should be noted that when using a large number of threads (e.g. more than 16) without the ``--in-memory`` option, performance is likely limited by disk throughput. Thus, users are advised to use a large number of threads only when temporary data (``/tmp`` by default on most UNIX-like systems) is stored on a fast SSD.

When the ``--in-memory`` option is not used, ``hictk`` will create a temporary file under the default temporary folder. This file stores interactions using a layout and compression that are optimized for the access pattern used by ``hictk balance``. When balancing large matrices, this file can be quite large (sometimes tens of GBs). If this is the case, it may be appropriate to change the temporary folder using the ``--tmpdir`` option.

Finally, when balancing .hic files, ``hictk`` depends on `JuicerTools <https://github.com/aidenlab/Juicebox/releases/latest>`_ or `HiCTools <https://github.com/aidenlab/HiCTools/releases/latest>`_ to write balancing weights back to the file. Thus, when balancing .hic files, the JAR file to one of the tools should be specified through the ``--juicer-tools-jar`` option. You should use JuicerTools when balancing .hic files in .hic v8 format or older and HiCTools when balancing .hic v9 files.
Loading

0 comments on commit 6e3a0b2

Please sign in to comment.