Skip to content

Prepare working directory

Gian M. Franceschini edited this page Jan 8, 2024 · 7 revisions

Prepare Hi-C fastq files

Put all Hi-C fastq files under the working directory. If the purpose is only to phase SNPs, all Hi-C dataset sharing the same haplotype can be pooled together

wk_dir/
|-- HiC_data1_1.fastq.gz
|-- HiC_data1_2.fastq.gz
|-- HiC_data2_1.fastq.gz
|-- HiC_data2_2.fastq.gz
|-- HiC_data3_1.fastq.gz
|-- HiC_data3_2.fastq.gz
...

Prepare configuration file

repo_dir=HaploC-tools
maxIS=1,2,5,10,20,50,100
genome_version=hg19
thread4bwa=30
enzyme=MboI
sizeGb=5
x=30

Make sure that the config file ends with an empty line.

Parameters in configuration file:

Name Description
repo_dir The full path to the HaploC-tools repository
maxIS different maximum insert size to be used by HapCUT2
genome_version Version of the reference genome. Currently supported: hg19 and mm10. When using mm10, no population phasing will be conducted in HapCUT2. Other genome will be added soon
thread4bwa Number of threads for Hi-C read alignment using bwa
enzyme Cutting enzyme used in the Hi-C experiment
sizeGb Split Hi-C fastq files into chunks of at most 5Gb size. Each chunk is then processed in parallel for several of the HaploC process
x When calling SNPs, using a subset of Hi-C reads to reach coverage of x

A demo working directory (containing Hi-C reads of chr14 and chr18 from the WSU cell line) can be downloaded from zenodo as outlined in the previous section. Follow this guideline to enable command line download (section V).

Next steps