Skip to content

Inferring allele-specific copy number aberrations and tumor phylogeography from spatially resolved transcriptomics

License

Notifications You must be signed in to change notification settings

ateeq-khaliq/CalicoST

 
 

Repository files navigation

CalicoST

CalicoST is a probabilistic model that infers allele-specific copy number aberrations and tumor phylogeography from spatially resolved transcriptomics.CalicoST has the following key features:

  1. Identifies allele-specific integer copy numbers for each transcribed region, revealing events such as copy neutral loss of heterozygosity (CNLOH) and mirrored subclonal CNAs that are invisible to total copy number analysis.
  2. Assigns each spot a clone label indicating whether the spot is primarily normal cells or a cancer clone with aberration copy number profile.
  3. Infers a phylogeny relating the identified cancer clones as well as a phylogeography that combines genetic evolution and spatial dissemination of clones.
  4. Handles normal cell admixture in SRT technologies hat are not single-cell resolution (e.g. 10x Genomics Visium) to infer more accurate allele-specific copy numbers and cancer clones.
  5. Simultaneously analyzes multiple regional or aligned SRT slices from the same tumor.

System requirements

The package has tested on the following Linux operating systems: SpringdaleOpenEnterprise 9.2 (Parma) and CentOS Linux 7 (Core).

Installation

First setup a conda environment from the environment.yml file:

cd CalicoST
conda config --add channels conda-forge
conda config --add channels bioconda
conda env create -f environment.yml --name calicost_env

Next download Eagle2 by

wget https://storage.googleapis.com/broad-alkesgroup-public/Eagle/downloads/Eagle_v2.4.1.tar.gz
tar -xzf Eagle_v2.4.1.tar.gz

Then install Startle by

git clone --recurse-submodules https://github.com/raphael-group/startle.git
cd startle
mkdir build; cd build
cmake -DLIBLEMON_ROOT=<lemon path>\
        -DCPLEX_INC_DIR=<cplex include path>\
        -DCPLEX_LIB_DIR=<cplex lib path>\
        -DCONCERT_INC_DIR=<concert include path>\
        -DCONCERT_LIB_DIR=<concert lib path>\
        ..
make

Finally, install CalicoST using pip by

conda activate calicost_env
pip install -e .

Setting up the conda environments takes around 10 minutes on an HPC head node.

Getting started

CalicoST requires the coordinate information of genes and SNPs, the information files for GRCh38 genome are available from either of the example data tarball. Specify the information file paths, your input SRT data paths, and running configurations in config.yaml, and then you can run CalicoST by

snakemake --cores <number threads> --configfile config.yaml --snakefile calicost.smk all

Check out our readthedocs for tutorials on the simulated data and prostate cancer data.

Run on a simulated example data

Download data

The simulated count matrices are available from examples/CalicoST_example.tar.gz. CalicoST requires a reference SNP panel and phasing panel, which can be downloaded from

Run CalicoST

Untar the downloaded example data. Replace the following paths in the example_config.yaml of the downloaded example data with paths on your machine

  • calicost_dir: the path to CalicoST git-cloned code.
  • eagledir: the path to Eagle2 directory
  • region_vcf: the path to the downloaded SNP panel.
  • phasing_panel: the path to the downloaded and unzipped phasing panel.

To avoid falling into local maxima in CalicoST's optimization objective, we recommend run CalicoST with multiple random initializations with a list random seed specified by random_state in the example_config.yaml file. The provided one uses five random initializations.

Then run CalicoST by

cd <directory of downloaded example data>
snakemake --cores 5 --configfile example_config.yaml --snakefile <calicost_dir>/calicost.smk all

CalicoST takes about 69 minutes to finish on this example using 5 cores on an HPC.

Understanding the output

The above snakemake run will create a folder calicost in the directory of downloaded example data. Within this folder, each random initialization of CalicoST generates a subdirectory of calicost/clone*.

CalicoST generates the following key files of each random initialization:

  • clone_labels.tsv: The inferred clone labels for each spot.
  • cnv_seglevel.tsv: Allele-specific copy numbers for each clone for each genome segment.
  • cnv_genelevel.tsv: The projected allele-specific copy numbers from genome segments to the covered genes.
  • cnv_diploid_seglevel.tsv, cnv_triploid_seglevel.tsv, cnv_tetraploid_seglevel.tsv, cnv_diploid_genelevel.tsv, cnv_triploid_genelevel.tsv, cnv_tetraploid_genelevel.tsv: Allele-specific copy numbers when enforcing a ploidy for each genome segment or each gene.

See the following examples of the key files.

head -10 calicost/clone3_rectangle0_w1.0/clone_labels.tsv
BARCODES        clone_label
spot_0  2
spot_1  2
spot_2  2
spot_3  2
spot_4  2
spot_5  2
spot_6  2
spot_7  2
spot_8  0
head -10 calicost/clone3_rectangle0_w1.0/cnv_seglevel.tsv
CHR     START   END     clone0 A        clone0 B        clone1 A        clone1 B        clone2 A        clone2 B
1       1001138 1616548 1       1       1       1       1       1
1       1635227 2384877 1       1       1       1       1       1
1       2391775 6101016 1       1       1       1       1       1
1       6185020 6653223 1       1       1       1       1       1
1       6785454 7780639 1       1       1       1       1       1
1       7784320 8020748 1       1       1       1       1       1
1       8026738 9271273 1       1       1       1       1       1
1       9292894 10375267        1       1       1       1       1       1
1       10398592        11922488        1       1       1       1       1       1
head -10 calicost/clone3_rectangle0_w1.0/cnv_genelevel.tsv
gene    clone0 A        clone0 B        clone1 A        clone1 B        clone2 A        clone2 B
A1BG    1       1       1       1       1       1
A1CF    1       1       1       1       1       1
A2M     1       1       1       1       1       1
A2ML1-AS1       1       1       1       1       1       1
AACS    1       1       1       1       1       1
AADAC   1       1       1       1       1       1
AADACL2-AS1     1       1       1       1       1       1
AAK1    1       1       1       1       1       1
AAMP    1       1       1       1       1       1

CalicoST graphs the following plots for visualizing the inferred cancer clones in space and allele-specific copy number profiles for each random initialization.

  • plots/clone_spatial.pdf: The spatial distribution of inferred cancer clones and normal regions (grey color, clone 0 by default)
  • plots/rdr_baf_defaultcolor.pdf: The read depth ratio (RDR) and B allele frequency (BAF) along the genome for each clone. Higher RDR indicates higher total copy numbers, and a deviation-from-0.5 BAF indicates allele imbalance due to allele-specific CNAs.
  • plots/acn_genome.pdf: The default allele-specific copy numbers along the genome.
  • plots/acn_genome_diploid.pdf, plots/acn_genome_triploid.pdf, plots/acn_genome_tetraploid.pdf: Allele-specific copy numbers when enforcing a ploidy.

The allele-specific copy number plots have the following color legend.

Software dependencies

CalicoST uses the following command-line packages and python for extracting the BAF information

  • samtools
  • cellsnp-lite
  • Eagle2
  • pysam
  • snakemake

CalicoST uses the following packages for the remaining steps to infer allele-specific copy numbers and cancer clones:

  • numpy
  • scipy
  • pandas
  • scikit-learn
  • scanpy
  • anndata
  • numba
  • tqdm
  • statsmodels
  • networkx
  • matplotlib
  • seaborn
  • snakemake

About

Inferring allele-specific copy number aberrations and tumor phylogeography from spatially resolved transcriptomics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.3%
  • Shell 0.7%