HiScanner (HIgh-resolution Single-Cell Allelic copy Number callER)

HiScanner is a python package for high-resolution single-cell copy number analysis. It supports two modes of operation:

Standard Pipeline (RDR + BAF): Full analysis using both read depth ratios (RDR) and B-allele frequencies (BAF)
RDR-only Pipeline: Simplified analysis using only read depth ratios

We provide a demo dataset and tutorial to help you get started. After installation, see https://github.com/parklab/hiscanner_demo for instructions.

Required External Files:

1) Mappability Track

hg19/GRCh37 (100mer)

wget https://www.math.pku.edu.cn/teachers/xirb/downloads/software/BICseq2/Mappability/hg19CRG.100bp.tar.gz --no-check-certificate
tar -xvzf hg19CRG.100bp.tar.gz

hg38/GRCh38 (150mer)

Download from: https://doi.org/10.6084/m9.figshare.28370357.v1

Other reference genomes: instructions on custom track generation

For other genomes/read length configurations, follow instructions at CGAP Annotations to generate mappability tracks.

IMPORTANT: Guidelines when choosing mappability tracks

Shorter mappability track (e.g., 100bp) with longer reads (e.g., 150bp): Valid but conservative (some uniquely mappable regions may be missed)
Longer mappability track (e.g., 150bp) with shorter reads (e.g., 100bp): Not valid, will cause false positives

2) Reference Genome

We require the reference genome fasta to be split into chromosomes, to allow for parallel processing. You can use the following command to split the reference genome:

samtools faidx /path/to/reference.fasta
mkdir /path/to/reference/split
awk '{print $1}' /path/to/reference.fasta.fai | xargs -I {} samtools faidx /path/to/reference.fasta {} > /path/to/reference/split/{}.fasta

Pipeline Options

HiScanner offers two pipeline options:

1. Standard Pipeline (RDR + BAF)

Uses both read depth ratios and B-allele frequencies for comprehensive CNV analysis.

Steps:

SNP Calling (via SCAN2)
Phasing & BAF Computation
ADO Analysis
Segmentation
CNV Calling

Requirements:

SCAN2 output files
Bulk and single cell BAM files
Reference genome
Mappability tracks

2. RDR-only Pipeline

Uses only read depth ratios for simplified CNV analysis.

Steps:

Segmentation
CNV Calling

Requirements:

Single cell BAM files
Reference genome
Mappability tracks

Running the Pipeline

Option 1: Standard Pipeline (RDR + BAF)

Step 1: SCAN2 (Prerequisites)

SCAN2 needs to be run separately before using HiScanner.

Note: If you only need SCAN2 output for HiScanner (and are not interested in SNV calls), you can save time by running SCAN2 with:
scan2 run --joblimit 5000 --snakemake-args ' --until shapeit/phased_hets.vcf.gz --latency-wait 120'
This will stop at the phasing step, which is sufficient for HiScanner's requirements.

If you have already run SCAN2, ensure you have:

VCF file with raw variants (gatk/hc_raw.mmq60.vcf.gz)
Phased heterozygous variants (shapeit/phased_hets.vcf)
Additionally, we note that the phased genotype field in phased_hets.vcf should be named as phasedgt. This is the expected output from the SCAN2 pipeline that we have tested with. If your VCF file has a different field name, please manually rename it to phasedgt in the VCF.

The expected location is scan2_out/ in your project directory.

Steps 2-5: HiScanner Pipeline

1. Initialize HiScanner project

hiscanner init --output ./my_project
cd my_project

2. Edit config.yaml

Edit with your paths and parameters

3. Prepare metadata file

Must contain the following columns:

bamID    bam    singlecell
bulk1    /path/to/bulk.bam    N
cell1    /path/to/cell1.bam   Y 
cell2    /path/to/cell2.bam   Y

4. Validate configuration

hiscanner validate

5. Run the pipeline

hiscanner run --step snp      # Check SCAN2 results
hiscanner run --step phase    # Process SCAN2 results
hiscanner run --step ado      # ADO analysis to identify optimal bin size
hiscanner run --step normalize # Normalize read depth ratios
hiscanner run --step segment  # Segmentation
hiscanner run --step cnv      # CNV calling

# Or run all steps at once:
hiscanner run --step all

For normalize (the most time-consuming step), we provide an option to run with cluster, e.g.,

hiscanner --config config.yaml run --step normalize --use-cluster

Option 2: RDR-only Pipeline

1. Initialize project

hiscanner init --output ./my_project
cd my_project

2. Edit config.yaml

Set rdr_only: true
Configure paths and parameters

3. Prepare metadata file

Must contain the following columns (bulk samples are not required):

bamID    bam    singlecell
cell1    /path/to/cell1.bam   Y 
cell2    /path/to/cell2.bam   Y

4. Validate configuration

hiscanner validate

5. Run the pipeline

hiscanner run --step normalize # Normalize read depth ratios
hiscanner run --step segment  # Segmentation
hiscanner run --step cnv      # CNV calling

Output Structure

hiscanner_output/
├── phased_hets/       # Processed heterozygous SNPs (Standard pipeline only)
├── ado/               # ADO analysis results (Standard pipeline only)
├── bins/             # Binned read depth
├── segs/             # Segmentation results
└── final_calls/      # Final CNV calls

Cleaning Up

HiScanner creates several temporary directories during analysis. You can clean these up using the clean command:

hiscanner clean

Troubleshooting

Common issues:

Missing SCAN2 results: Ensure scan2_output directory is correctly specified
File permissions: Check access to BAM files and reference data
Memory issues: Adjust batch_size in config.yaml

For more detailed information, check the log files in hiscanner_output/logs/

Support

HiScanner is currently under active development. For support or questions, please open an issue on our GitHub repository.

Citation

If you use HiScanner in your research, please cite:

Zhao, Y., Luquette, L. J., Veit, A. D., Wang, X., Xi, R., Viswanadham, V. V., ... & Park, P. J. (2024). High-resolution detection of copy number alterations in single cells with HiScanner. bioRxiv, 2024-04.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
hiscanner		hiscanner
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HiScanner (HIgh-resolution Single-Cell Allelic copy Number callER)

Table of Contents

Installation

Required External Files:

1) Mappability Track

2) Reference Genome

Pipeline Options

1. Standard Pipeline (RDR + BAF)

2. RDR-only Pipeline

Running the Pipeline

Option 1: Standard Pipeline (RDR + BAF)

Step 1: SCAN2 (Prerequisites)

Steps 2-5: HiScanner Pipeline

Option 2: RDR-only Pipeline

Output Structure

Cleaning Up

Troubleshooting

Support

Citation

About

Releases

Packages

Languages

License

parklab/HiScanner

Folders and files

Latest commit

History

Repository files navigation

HiScanner (HIgh-resolution Single-Cell Allelic copy Number callER)

Table of Contents

Installation

Required External Files:

1) Mappability Track

2) Reference Genome

Pipeline Options

1. Standard Pipeline (RDR + BAF)

2. RDR-only Pipeline

Running the Pipeline

Option 1: Standard Pipeline (RDR + BAF)

Step 1: SCAN2 (Prerequisites)

Steps 2-5: HiScanner Pipeline

Option 2: RDR-only Pipeline

Output Structure

Cleaning Up

Troubleshooting

Support

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages