CalicoST

CalicoST is a probabilistic model that infers allele-specific copy number aberrations and tumor phylogeography from spatially resolved transcriptomics.CalicoST has the following key features:

Identifies allele-specific integer copy numbers for each transcribed region, revealing events such as copy neutral loss of heterozygosity (CNLOH) and mirrored subclonal CNAs that are invisible to total copy number analysis.
Assigns each spot a clone label indicating whether the spot is primarily normal cells or a cancer clone with aberration copy number profile.
Infers a phylogeny relating the identified cancer clones as well as a phylogeography that combines genetic evolution and spatial dissemination of clones.
Handles normal cell admixture in SRT technologies hat are not single-cell resolution (e.g. 10x Genomics Visium) to infer more accurate allele-specific copy numbers and cancer clones.
Simultaneously analyzes multiple regional or aligned SRT slices from the same tumor.

System requirements

The package has tested on the following Linux operating systems: SpringdaleOpenEnterprise 9.2 (Parma) and CentOS Linux 7 (Core).

Installation

First setup a conda environment from the environment.yml file:

cd CalicoST
conda config --add channels conda-forge
conda config --add channels bioconda
conda env create -f environment.yml --name calicost_env

Next download Eagle2 by

wget https://storage.googleapis.com/broad-alkesgroup-public/Eagle/downloads/Eagle_v2.4.1.tar.gz
tar -xzf Eagle_v2.4.1.tar.gz

Then install Startle by

git clone --recurse-submodules https://github.com/raphael-group/startle.git
cd startle
mkdir build; cd build
cmake -DLIBLEMON_ROOT=<lemon path>\
        -DCPLEX_INC_DIR=<cplex include path>\
        -DCPLEX_LIB_DIR=<cplex lib path>\
        -DCONCERT_INC_DIR=<concert include path>\
        -DCONCERT_LIB_DIR=<concert lib path>\
        ..
make

Finally, install CalicoST using pip by

conda activate calicost_env
pip install -e .

Setting up the conda environments takes around 10 minutes on an HPC head node.

Getting started

CalicoST requires the coordinate information of genes and SNPs, the information files for GRCh38 genome are available from either of the example data tarball. Specify the information file paths, your input SRT data paths, and running configurations in config.yaml, and then you can run CalicoST by

snakemake --cores <number threads> --configfile config.yaml --snakefile calicost.smk all

Check out our readthedocs for tutorials on the simulated data and prostate cancer data.

Run on a simulated example data

Download data

The simulated count matrices are available from examples/CalicoST_example.tar.gz. CalicoST requires a reference SNP panel and phasing panel, which can be downloaded from

SNP panel. You can also choose other SNP panels from cellsnp-lite webpage.
Phasing panel

Run CalicoST

Untar the downloaded example data. Replace the following paths in the example_config.yaml of the downloaded example data with paths on your machine

calicost_dir: the path to CalicoST git-cloned code.
eagledir: the path to Eagle2 directory
region_vcf: the path to the downloaded SNP panel.
phasing_panel: the path to the downloaded and unzipped phasing panel.

To avoid falling into local maxima in CalicoST's optimization objective, we recommend run CalicoST with multiple random initializations with a list random seed specified by random_state in the example_config.yaml file. The provided one uses five random initializations.

Then run CalicoST by

cd <directory of downloaded example data>
snakemake --cores 5 --configfile example_config.yaml --snakefile <calicost_dir>/calicost.smk all

CalicoST takes about 69 minutes to finish on this example using 5 cores on an HPC.

Understanding the output

The above snakemake run will create a folder calicost in the directory of downloaded example data. Within this folder, each random initialization of CalicoST generates a subdirectory of calicost/clone*.

CalicoST generates the following key files of each random initialization:

clone_labels.tsv: The inferred clone labels for each spot.
cnv_seglevel.tsv: Allele-specific copy numbers for each clone for each genome segment.
cnv_genelevel.tsv: The projected allele-specific copy numbers from genome segments to the covered genes.
cnv_diploid_seglevel.tsv, cnv_triploid_seglevel.tsv, cnv_tetraploid_seglevel.tsv, cnv_diploid_genelevel.tsv, cnv_triploid_genelevel.tsv, cnv_tetraploid_genelevel.tsv: Allele-specific copy numbers when enforcing a ploidy for each genome segment or each gene.

See the following examples of the key files.

head -10 calicost/clone3_rectangle0_w1.0/clone_labels.tsv
BARCODES        clone_label
spot_0  2
spot_1  2
spot_2  2
spot_3  2
spot_4  2
spot_5  2
spot_6  2
spot_7  2
spot_8  0

head -10 calicost/clone3_rectangle0_w1.0/cnv_seglevel.tsv
CHR     START   END     clone0 A        clone0 B        clone1 A        clone1 B        clone2 A        clone2 B
1       1001138 1616548 1       1       1       1       1       1
1       1635227 2384877 1       1       1       1       1       1
1       2391775 6101016 1       1       1       1       1       1
1       6185020 6653223 1       1       1       1       1       1
1       6785454 7780639 1       1       1       1       1       1
1       7784320 8020748 1       1       1       1       1       1
1       8026738 9271273 1       1       1       1       1       1
1       9292894 10375267        1       1       1       1       1       1
1       10398592        11922488        1       1       1       1       1       1

head -10 calicost/clone3_rectangle0_w1.0/cnv_genelevel.tsv
gene    clone0 A        clone0 B        clone1 A        clone1 B        clone2 A        clone2 B
A1BG    1       1       1       1       1       1
A1CF    1       1       1       1       1       1
A2M     1       1       1       1       1       1
A2ML1-AS1       1       1       1       1       1       1
AACS    1       1       1       1       1       1
AADAC   1       1       1       1       1       1
AADACL2-AS1     1       1       1       1       1       1
AAK1    1       1       1       1       1       1
AAMP    1       1       1       1       1       1

CalicoST graphs the following plots for visualizing the inferred cancer clones in space and allele-specific copy number profiles for each random initialization.

plots/clone_spatial.pdf: The spatial distribution of inferred cancer clones and normal regions (grey color, clone 0 by default)
plots/rdr_baf_defaultcolor.pdf: The read depth ratio (RDR) and B allele frequency (BAF) along the genome for each clone. Higher RDR indicates higher total copy numbers, and a deviation-from-0.5 BAF indicates allele imbalance due to allele-specific CNAs.
plots/acn_genome.pdf: The default allele-specific copy numbers along the genome.
plots/acn_genome_diploid.pdf, plots/acn_genome_triploid.pdf, plots/acn_genome_tetraploid.pdf: Allele-specific copy numbers when enforcing a ploidy.

The allele-specific copy number plots have the following color legend.

Software dependencies

CalicoST uses the following command-line packages and python for extracting the BAF information

samtools
cellsnp-lite
Eagle2
pysam
snakemake

CalicoST uses the following packages for the remaining steps to infer allele-specific copy numbers and cancer clones:

numpy
scipy
pandas
scikit-learn
scanpy
anndata
numba
tqdm
statsmodels
networkx
matplotlib
seaborn
snakemake

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
docs		docs
examples		examples
src/calicost		src/calicost
utils		utils
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
calicost.smk		calicost.smk
config.yaml		config.yaml
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CalicoST

System requirements

Installation

Getting started

Run on a simulated example data

Download data

Run CalicoST

Understanding the output

Software dependencies

About

Releases

Packages

Languages

License

ateeq-khaliq/CalicoST

Folders and files

Latest commit

History

Repository files navigation

CalicoST

System requirements

Installation

Getting started

Run on a simulated example data

Download data

Run CalicoST

Understanding the output

Software dependencies

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages