PCRan

About

The PCRan program is designed to process raw PCR thermocycler data, downloaded in .xls format or fit prepared PCR data in .txt format by line (standard line).

Running

To run the program PCRan you need to install the libraries pandas, matplotlib and scipy, set the necessary parameters in config.xls file and run PCRan.py script. All results are saved to a folder results.

PCRan work in three modes: rampl, lfd and all.

rampl is analyze raw amplification data from PCR thermocycler in the .xls format. This data is stored as three columns (Well -- well ID, Cycle -- cycle of amplification, dRn -- fluorescence on the current cycle). In this mode PCRan plot amplification curves, calculate Ct for each well and save results.
lfd mode is needed to fit by line the prepared data which are specified in the .txt file. This data is stored as two columns: where the first column refers to x and the second to y. In this mode PCRan plot only results of liner fit.
all mode combine both rampl and lfd. In the mode program takes raw amplification data (.xls format), analyze it like in rampl mode and then process linear fitting like in lfd mode.

Parameters in config.xls:

filename --- the name of the file to use as data
mode --- the type of data the file filename contains (rampl or lfd)
detection --- method for determining the point at which a signal is detected (threshold or linear)
threshold --- number greater than 0, ignored for linear detection
pmod --- plotting of amplification signals (singleplot or multiplot)
method --- linear fit method (lsq or hi2)
y_name --- the name of the values used as y for linear fit (ct, drfu or another)
x_name --- the name of the values used as x for linear fit
need_log_x --- need to logarithm of the x axis
need_eff --- need efficiency calculation
wells --- well IDs whose ct or drfu is used as y when constructing a straight line
x --- values used as x for linear fit

You can run PCRan with pre-prepared data located in "static\test_data". Set parameters in config.xls according to Example 1 or Example 2 and run program.

Example 1

filename --- static/test_data/ampl.xls
mode --- all
detection --- threshold
threshold --- 5000
pmod --- multiplot
method --- hi2
y_name --- ct
x_name --- conc
need_log_x --- True
need_eff --- True
wells --- A1, B1, C1, A2, B2, C2, A3, B3, C3 (in column)
x --- 10000, 10000, 10000, 100, 100, 100, 1, 1, 1 (in column)

Example 2

filename --- static/test_data/conc-ct.txt
mode --- lfd
detection --- irrelevant
threshold --- irrelevant
pmod --- irrelevant
method --- lsq
y_name --- y
x_name --- x
need_log_x --- True
need_eff --- False
wells --- irrelevant
x --- irrelevant

Implementation

Amplification data approximation

Functions of the sigmoid family are suitable for approximation of PCR reaction amplification data. The program uses hyperbolic tangent fitting function curve_fit from scipy library.

$y=A+B\th(\cfrac{x-x_0}{\sigma})$

Determination of the threshold number of cycles Ct

Determination of the threshold number of cycles in the program is implemented in two different ways: by the threshold line (by the excess of the fluorescence signal of a given value) or by the linearity point of the amplification curve growth (the maximum value of the derivative of the function that fits the amplification data).

Fitting data by line and calculating efficiency

The program implements the fitting data by line by two known methods: the least squares method and the chi 2 method. The chi 2 method requires several replicates (more than one), which are necessary to calculate the errors at a given point. As a rule, few replicates are used in the experiment (less than 10), therefore, the formula can be used as an estimate of the errors.

$\Delta x=\cfrac{x^{+}+x^-}{2}$

Where:

x⁺ is the maximum value of the dataset
x^- is the minimum value of the dataset

Also, the program evaluates the errors of the coefficients of the straight line and calculates the efficiency.

Theory

PCR amplification

During the amplification process the PCR reaction product accumulates which leads to an exponential increase in the fluorescence signal at the initial stages. Then, there is a gradual slowing down of the reaction and reaching a "plateau" as a result of depletion of the reaction. This dependence can be described by one of the functions of the sigmoid family, for example:

$y=A+B\th(\cfrac{x-x_0}{\sigma})$

Threshold number of cycles

The cycle threshold indicates at what point the fluorescence signal has reached a certain value or growth phase. The Ct threshold cycle can be different for the same data, depending on the method and different parameters.

Efficiency calculation

The accumulation of the PCR reaction product is described by an exponential law:

$C=C_0(1+E)^n$

Where:

C is the product concentration
C₀ is the initial product concentration
E is the reaction efficiency
n is the number of cycles.

In the ideal case, when the doubling of the reaction product occurs for each cycle, the reaction efficiency is 1 and the dependence takes on a simpler form:

$C=C_{0}2^n$

To assess the effectiveness of E, a standard reaction curve is constructed — the dependence of the number of cycles to a positive reaction response (Ct) on the decimal logarithm of the concentration. This dependence is described by a linear law (a straight line), the slope of which contains information about efficiency. The slope of the standard curve in this case shows how many cycles the product concentration will change by one order of magnitude (that is, 10 times). Thus, equation can be rewritten as:

$10C_0=C_0(1+E)^\alpha$

Where:

α is the modulus of the slope angle of the standard straight line

Solving this equation for E, we find the formula for calculating the reaction efficiency E. Knowing the error with which the slope angle α was determined, the efficiency error can be estimated using the formula.

$E=10^\frac{1}{\alpha}-1$

$\Delta E=\cfrac{\ln{10}\times10^\frac{1}{\alpha}}{\alpha^2}\Delta\alpha$

Where:

E is the efficiency
α is the slope coefficient of the straight line
∆α is the error in estimating the slope angle coefficient of the straight line

Calculation of concentrations by Ct

The known coefficients of the standard straight line make it possible to calculate the concentration of the sample from the known values of the threshold cycles. It should be borne in mind that in this case the same methods for determining Ct should be used both when constructing a standard straight line (estimating the coefficients α and β) and when determining the threshold cycle Ct of a sample.

$C_0=10^\frac{C_t-\beta}{\alpha}$

In this case, the error is calculated according to:

$\Delta C_0=\cfrac{10^\frac{C_t-\beta}{\alpha}\ln{10}}{|\alpha|}\sqrt{\Delta C_t^2+\Delta\beta^2+(\frac{C_t-\beta}{\alpha}\Delta\alpha)^2}$

Linear data fit methods

Least square method

The simplest method for constructing a direct approximation is the least squares method, which minimizes the sum:

$S(\alpha,\beta)=\sum_{i=1}^{n}(y_i-\alpha x_i-\beta)^2$

Where:

n is the number of experimental points
y_i are the values along the ordinate
x_i are the values along the abscissa
α is the slope parameter of the straight line
β is the cutoff parameter of the straight line

The solution to the problem of finding the minimum value of the sum S (α, β) has the form:

$\alpha=\cfrac{\langle xy\rangle-\langle x\rangle\langle y\rangle}{\langle x^2\rangle-\langle x\rangle^2}$

$\beta=\langle y\rangle-\alpha\langle x\rangle$

If we assume that the measurement error x is negligible, and the errors in 𝑦 are the same for all experimental points, are independent and have a random nature, then the estimation of the parameter errors is described according to:

$\Delta\alpha=\sqrt{\cfrac{1}{n-2}(\frac{D_{yy}}{D_{xx}}-\alpha^2)}$