The PCRan program is designed to process raw PCR thermocycler data, downloaded in .xls
format or fit prepared PCR data in .txt
format by line (standard line).
To run the program PCRan you need to install the libraries pandas
, matplotlib
and scipy
, set the necessary parameters in config.xls file and run PCRan.py script. All results are saved to a folder results.
PCRan work in three modes: rampl
, lfd
and all
.
rampl
is analyze raw amplification data from PCR thermocycler in the.xls
format. This data is stored as three columns (Well
-- well ID,Cycle
-- cycle of amplification,dRn
-- fluorescence on the current cycle). In this mode PCRan plot amplification curves, calculate Ct for each well and save results.lfd
mode is needed to fit by line the prepared data which are specified in the.txt
file. This data is stored as two columns: where the first column refers tox
and the second toy
. In this mode PCRan plot only results of liner fit.all
mode combine bothrampl
andlfd
. In the mode program takes raw amplification data (.xls
format), analyze it like inrampl
mode and then process linear fitting like inlfd
mode.
Parameters in config.xls:
filename
--- the name of the file to use as datamode
--- the type of data the filefilename
contains (rampl
orlfd
)detection
--- method for determining the point at which a signal is detected (threshold
orlinear
)threshold
--- number greater than 0, ignored forlinear
detectionpmod
--- plotting of amplification signals (singleplot
ormultiplot
)method
--- linear fit method (lsq
orhi2
)y_name
--- the name of the values used asy
for linear fit (ct
,drfu
or another)x_name
--- the name of the values used asx
for linear fitneed_log_x
--- need to logarithm of thex
axisneed_eff
--- need efficiency calculationwells
--- well IDs whosect
ordrfu
is used asy
when constructing a straight linex
--- values used asx
for linear fit
You can run PCRan with pre-prepared data located in "static\test_data". Set parameters in config.xls according to Example 1
or Example 2
and run program.
filename
--- static/test_data/ampl.xlsmode
--- alldetection
--- thresholdthreshold
--- 5000pmod
--- multiplotmethod
--- hi2y_name
--- ctx_name
--- concneed_log_x
--- Trueneed_eff
--- Truewells
--- A1, B1, C1, A2, B2, C2, A3, B3, C3 (in column)x
--- 10000, 10000, 10000, 100, 100, 100, 1, 1, 1 (in column)
filename
--- static/test_data/conc-ct.txtmode
--- lfddetection
--- irrelevantthreshold
--- irrelevantpmod
--- irrelevantmethod
--- lsqy_name
--- yx_name
--- xneed_log_x
--- Trueneed_eff
--- Falsewells
--- irrelevantx
--- irrelevant
Functions of the sigmoid family are suitable for approximation of PCR reaction amplification data. The program uses hyperbolic tangent fitting function curve_fit
from scipy
library.
Determination of the threshold number of cycles in the program is implemented in two different ways: by the threshold line (by the excess of the fluorescence signal of a given value) or by the linearity point of the amplification curve growth (the maximum value of the derivative of the function that fits the amplification data).
The program implements the fitting data by line by two known methods: the least squares method and the chi 2 method. The chi 2 method requires several replicates (more than one), which are necessary to calculate the errors at a given point. As a rule, few replicates are used in the experiment (less than 10), therefore, the formula can be used as an estimate of the errors.
Where:
- x+ is the maximum value of the dataset
- x- is the minimum value of the dataset
Also, the program evaluates the errors of the coefficients of the straight line and calculates the efficiency.
During the amplification process the PCR reaction product accumulates which leads to an exponential increase in the fluorescence signal at the initial stages. Then, there is a gradual slowing down of the reaction and reaching a "plateau" as a result of depletion of the reaction. This dependence can be described by one of the functions of the sigmoid family, for example:
The cycle threshold indicates at what point the fluorescence signal has reached a certain value or growth phase. The Ct threshold cycle can be different for the same data, depending on the method and different parameters.
The accumulation of the PCR reaction product is described by an exponential law:
Where:
- C is the product concentration
- C0 is the initial product concentration
- E is the reaction efficiency
- n is the number of cycles.
In the ideal case, when the doubling of the reaction product occurs for each cycle, the reaction efficiency is 1 and the dependence takes on a simpler form:
To assess the effectiveness of E, a standard reaction curve is constructed — the dependence of the number of cycles to a positive reaction response (Ct) on the decimal logarithm of the concentration. This dependence is described by a linear law (a straight line), the slope of which contains information about efficiency. The slope of the standard curve in this case shows how many cycles the product concentration will change by one order of magnitude (that is, 10 times). Thus, equation can be rewritten as:
Where:
- α is the modulus of the slope angle of the standard straight line
Solving this equation for E, we find the formula for calculating the reaction efficiency E. Knowing the error with which the slope angle α was determined, the efficiency error can be estimated using the formula.
Where:
- E is the efficiency
- α is the slope coefficient of the straight line
- ∆α is the error in estimating the slope angle coefficient of the straight line
The known coefficients of the standard straight line make it possible to calculate the concentration of the sample from the known values of the threshold cycles. It should be borne in mind that in this case the same methods for determining Ct should be used both when constructing a standard straight line (estimating the coefficients α and β) and when determining the threshold cycle Ct of a sample.
In this case, the error is calculated according to:
The simplest method for constructing a direct approximation is the least squares method, which minimizes the sum:
Where:
- n is the number of experimental points
- yi are the values along the ordinate
- xi are the values along the abscissa
- α is the slope parameter of the straight line
- β is the cutoff parameter of the straight line
The solution to the problem of finding the minimum value of the sum S (α, β) has the form:
If we assume that the measurement error x is negligible, and the errors in 𝑦 are the same for all experimental points, are independent and have a random nature, then the estimation of the parameter errors is described according to:
Where:
- α is the slope coefficient of the straight line,
- n is the number of experimental points
- is the covariance y
- is the covariance x
Another popular method is chi square, which also takes into account the uncertainty of each measurement. This method minimizes the sum of chi 2:
Where:
- n is the number of experimental points
- yi are the values along the ordinate
- xi are the values along the abscissa
- α is the slope parameter of the straight line.
- β is the cutoff parameter of the straight line
The solution to the problem of finding the minimum value of the sum has the form:
Where the weighted average is defined as:
The estimation of parameter errors is described according to: