We introduce RepliFit, a comprehensive toolkit for analysing DNA replication timing, origin firing rates, and genomic stability across cell lines and chromosomal regions. Complementing the work in Berkemeier et al. (2024), it includes functions for loading and processing data across whole-genome regions, telomeres, centromeres, and specific loci of interest. By fitting origin firing rates to replication timing data, the toolkit efficiently predicts and compares experimental and modelled timing profiles. The resulting error distributions between predicted and experimental data help pinpoint regions of interest. With datasets spanning diverse chromosomes and genomic features, this toolkit enables detailed visualisation and analysis of replication dynamics and genomic stability.
Consider a DNA molecule with
While this expression holds true for an infinitely large genome, in practical terms this series can be limited to
Clone the repository to a local directory. To explore the functionality and understand the workflow, we recommend using the provided dataset (data.zip
), which includes all necessary dependencies. Extract it to the main directory. Once downloaded, place the files into a subfolder named data
within the main project directory. The dataset includes .bedgraph
files representing timing errors and origin firing rates. These files can be uploaded to the Genome Browser for visualisation and comparison with other genomic data.
If you want to upload your own data, our tool allows you to process timing data and convert it to origin firing rates using our model. This can be run locally or on an HPC platform. To handle bigWig files, we recommend installing pybigtools
. All required utilities and dependencies are documented in utilities.ipynb
. DNA replication simulations are performed using the Beacon Calculus.
As an example, consider importing a bigWig file for HUVEC cells from the ENCODE database (wavelet smooth signal). A local copy is also provided at data/bigwig_files/HUVEC.bw
. Before fitting the data, it must be pre-processed. This can be done using model.py
(Data generation), which converts Repli-seq bigWig files into text files at the desired resolution (e.g., 1 kb). The processed files are stored in data/whole-genome_timing_data
.
The script code_fit.py
contains the main fitting and data generation functions:
fitfunction
: Performs model fitting.datagenfs
: Processes the input data for fitting.
You can customise the fitting process by modifying the following parameters:
cell_line
: Specify the cell line name.chr_number
: Select the chromosome number.chrpos_min
andchrpos_max
: Define the fitting region (or enable whole-genome fitting withall_dataQ
—note this is computationally intensive).- Additional options:
- Fork speed:
fork_speed
- Sampling intervals:
resolution
(default: 1000 for 1 kb) - Timing data scaling:
scale-factor
- Fitting iterations
- Radius of influence:
int_width
(in bp)
- Fork speed:
To reproduce the plots from the paper, open plots.ipynb
. This notebook imports functions from utilities.ipynb
, so ensure that the utilities are executed first and all dependencies are installed. You can run the cells in any order to explore dynamics across different cell lines and genomic regions. To save plots, ensure that the Figures
folder exists in the main directory.
Within plots.ipynb
, one can run the following code
cell_line = "HUVEC"
chr_number = 12
chrpos_min = 80000
chrpos_max = 90000
scale_factor = 6
file_name = f'{cell_line}_chr[{chr_number}]_{chrpos_min}-{chrpos_max}'
spec_fileQ = False
saveQ = False
rt_plotf(cell_line,chr_number,chrpos_min,chrpos_max,scale_factor,file_name,spec_fileQ,saveQ)
If saveQ = True
, the plot is saved in figures/file_name.pdf
.
This project is openly distributed under the MIT License. This license allows unrestricted use, redistribution, and modification, provided that proper attribution to the original creators is maintained.
For further information, contributions, or queries, please contact:
- Email: fp409@cam.ac.uk
- GitHub: fberkemeier
We welcome issues and discussions via GitHub to improve the model or address potential enhancements.