-
Notifications
You must be signed in to change notification settings - Fork 1
4. Outputs
Q-PHAST calculates different fitness and susceptibility measurements. First, our pipeline uses the time-vs-growth curve to infer fitness for each spot in each drug concentration, using the QFA package. Note that for one strain we may have multiple spots (technical replicates). There are two types of fitness estimates:
-
Model-based fitness estimates: estimated by fitting a generalised logistic model to the time-vs-growth curve. The model parameters give us different fitness estimates. These estimates can be useful if we have some spots that did not reach stationary phase (to predict maximum growth, for example) or we have mixed samples with different growth times. These don't work well if we have slow-growing spots or non-logistic curves (which may happen because there is cell death after reaching stationary phase).
K
,r
,g
,v
,MDR
,MDP
,DT
,AUC
,MDRMDP
,rsquare
(see below) are related to such model fitting. -
Non parametric (or numeric) fitness estimates: these are calculated directly from the data, without assuming any underlying growth model. We generally use these (
nAUC
andDT_h
) if we have experiments with the same growth times.nAUC
,nr
,nr_t
,maxslp
,maxslp_t
,DT_h
,nSTP
andDT_h_goodR2
(see below) are non-parametric measurements.
These are the relevant fitness estimates (check the qfa manual for more information):
-
K
,r
,g
andv
are the parameters of a generalised logistic model that is fit to the data.K
(maximum predicted growth) andr
(predicted growth rate) are fitness estimates that may be used. -
MDR
(Maximum Doubling Rate),MDP
(Maximum Doubling Potential),DT
(Doubling Time estimated from the model fit at t=0),AUC
(Area Under the growth-vs-fitness fit Curve) andMDRMDP
(Addinall et al. style fitness) are several fitness estimates calculated from the model fit. -
rsquare
is the coefficient of determination between the model fit and the data. You can use it to determine which curves have a good model fit (i.e. rsquare > 0.95). -
nr
is a numerical estimate of intrinsic growth rate. It is estimated by fitting smoothing function to log of data, calculating numerical slope estimate across range of data and selecting the maximum estimate (should occur during exponential phase).nr_t
is the time at whichnr
occurs, so that is an estimator of the lag phase time. -
maxslp
is a numerical estimate of maximum slope of growth curve, andmaxslp_t
is the time at which this maximum slope of observations occurs.maxslp_t
is a way to calculate the lag phase. -
nAUC
is the numerical Area Under Curve. This is a model-free fitness estimate, directly calculated from the data, measuring the AUC of the time-vs-growth curve between t=0 and t=hours_experiment (24h by default). This is our preferred fitness esimate. -
nSTP
is the numerical Single-Timepoint growth estimate, calculated at t=hours_experiment (24h by default). -
DT_h
is a numerical estimate for the maximum doubling time, in hours.DT_h_goodR2
is the same value but only for those spots with a good model fit (rsquare>0.95). For poorly fit curves theDT_h_goodR2
is set to 25.0 (very high). ThisDT_h_goodR2
can be used to have as non-growing the samples with weird curves.
Once all fitness estimates are measured, Q-PHAST calculates the relative fitness (i.e. nAUC_rel
, r_rel
or nSTP_rel
) for each spot by dividing the raw fitness by the fitness at concentration==0. This is only performed if a plate with concentration==0 is provided. These relative fitness measurements are essential to perform the susceptibility analysis.
For drugs with at least two non-0 concentrations, Q-PHAST calculates susceptibility for each spot in each drug. Note that there are some filters applied to ensure high-quality susceptibility measurements. We only considered as spots 'valid for relative fitness and susceptibility calculations' (in specific concentrations) as those that i) were not 'bad spots' (defined as explained here), ii) had a maximum of one non-0 concentration where the spot was flagged as 'bad spot' and iii) the concentration==0 was growing (according to the nAUC threshold explained here) and was not flagged as a 'bad spot'. Our pipeline uses these 'valid spots' to infer the following susceptibility estimates (considering either K_rel
, r_rel
, nr_rel
, maxslp_rel
, MDP_rel
, MDR_rel
, MDRMDP_rel
, AUC_rel
, nAUC_rel
or nSTP_rel
as relative fitness estimates):
-
MIC
: Minimum Inhibitory Concentration, a typical estimator of drug susceptibility. It is the minimum concentration in which relative fitness is below 0.25 (MIC_25
), 0.5 (MIC_50
), 0.75 (MIC_75
) or 0.9 (MIC_90
). 0.25, 0.5, 0.75 and 0.9 are hereafter referred to as<mic fraction>
. In cases where all concentrations have a relative fitness above the<mic fraction>
,MIC
is set to twice the maximum assayed concentration. In addition, to take into account that for some spots it cannot be accurately measured,MIC
is set toNaN
if i) all 'valid spots' have relative fitness above the<mic fraction>
but the maximum concentration is a 'bad spot', ii)MIC
is apparently the second assayed concentration, but the first concentration is a 'bad spot' or iii) the concentration before theMIC
is a bad-spot and there is a large distance betweenMIC
and the previous 'valid spot' concentration (>= 0.001). ` -
SMG
: Supra-MIC Growth, an estimator of drug tolerance (see Berman et. al. 2020). It is the average (raw) fitness for concentrations above the MIC, normalized by the fitness at concentration==0. There is oneSMG
estimate for each<mic fraction>
. Note thatSMG
is only calculated for spots in which i)MIC
is notNaN
and ii) there are at least two concentrations above the MIC. -
rAUC
: Resistance AUC, an estimator of drug susceptibility proposed in Ksiezopolska, Schikora-Tamarit et. al. 2021. This is the Area Under the concentration - vs - relative fitness Curve, normalized by a 'maximum AUC' where relative fitness is 1.0 across all assayed concentrations. Higher rAUCs indicate lower drug susceptibility. To take into account that concentrations are often set in logarithmic ranges (i.e. 0, 0.1, 0.2, 0.4, 0.8 ...),rAUC
is calculated using either log2-transformed concentrations (rAUC_log2_concentration
) or real concentrations (rAUC_concentration
). To take into account that for some spots it cannot be accurately measured,rAUC
is set toNaN
if i) there are <3 concentrations (including 0) or ii) the maximum concentration is a 'bad spot' and the highest concentration with a 'valid spot' is growing (according to the nAUC threshold explained here).
This pipeline generates several following files / folders under the output directory. Those related to relative fitness calculations are only generated in there is some plate with concentration==0. In addition, outputs related to susceptibility measurements are only generated for drugs with at least two non-0 concentrations. Below is the description of the files generated.
The files fitness_measurements_simple.xlsx
and relative_fitness_measurements_simple.xlsx
include the averaged per-strain fitness and relative fitness measurements, respectively. These files use nAUC
as the fitness estimate. These are the columns:
-
drug
,concentration
,strain
andexperiment_name
are the sample identifiers as specified in the input plate layout. -
# replicates
indicates the number of technical replicate spots used to do the averaged fitness calculations. For raw fitness, our pipeline considers all spots that are not 'bad spots'. For relative fitness, Q-PHAST only considers spots that are 'valid for relative fitness and susceptibility calculations' (defined above). -
median_nAUC
,mode_nAUC
indicate the median and modenAUC
across technical replicates. In the relative fitness table there are the equivalentmedian_nAUC_rel
andmode_nAUC_rel
. -
mad_nAUC
andrange_nAUC
show the dispersion across replicates.mad_nAUC
shows the median absolute deviation, andrange_nAUC
indicates the minimum and maximum values. In the relative fitness table there are the equivalentmad_nAUC_rel
andrange_nAUC_rel
.
When no concentration==0 is provided, Q-PHAST generates a file called 'raw_nAUC_across_drugs_heatmap.pdf'. This is a heatmap showing, for each strain in each drug, the median and MAD (median absolute deviation) nAUC
across replicates. This plot provides an overview about the experiment. Note that there is also a plot with a .no_clustering.
tag, which have the strains sorted alphabetically.
The file susceptibility_measurements_simple.xlsx
includes the averaged per-strain susceptibility measurements, when considering nAUC
as a fitness estimate. To make it simple, this table only includes information about MIC_50
, SMG_MIC_50
, rAUC_concentration
(based on real, not log2-transformed, concentrations) and rAUC_log2_concentration
(based on log2-transformed concentrations). These are the columns:
-
drug
,concentration
,strain
andexperiment_name
are the sample identifiers as specified in the input plate layout. -
max_concentration
indicates the maximum assayed concentration. This is relevant for comparisons with other datasets. -
replicates_MIC50
,replicates_SMG-MIC50
,replicates_rAUC
andreplicates_rAUC_log2
indicate the number of technical replicate spots used to calculate each susceptibility measurement in each strain. Note that all these values might beNaN
for certain spots (explained above), and that only spots considered 'valid for relative fitness and susceptibility calculations' were used (see above). -
median_MIC50
,mode_MIC50
,median_SMG-MIC50
,mode_SMG-MIC50
,median_rAUC
,mode_rAUC
,median_rAUC_log2
andmode_rAUC_log2
indicate the median and mode across technical replicates. -
mad_MIC50
,range_MIC50
,mad_SMG-MIC50
,range_SMG-MIC50
,mad_rAUC
,range_rAUC
,mad_rAUC_log2
andrange_rAUC_log2
indicate the dispersion across technical replicates.mad
stands for median absolute deviation.
The folder summary_plots
contains several plots for each drug which provide an overview about the experiment. These are the files:
-
[<drug>]_vs_nAUC_lines_all.pdf
shows for each spot the concentration vs fitness (nAUC
) curve. Spots that are not 'valid for relative fitness and susceptibility calculations' (defined above) are outlined with squares. This plot is very useful to see the consistency of the measurements across replicates. -
[<drug>]_vs_nAUC_lines_only_correct.pdf
is equivalent to[drug]_vs_nAUC_lines_all.pdf
, but only showing spots that are 'valid for relative fitness and susceptibility calculations'. -
[<drug>]_vs_nAUC_rel_lines_only_correct.pdf
is equivalent to[drug]_vs_nAUC_lines_only_correct.pdf
, but showingnAUC_rel
values (relative fitness). -
[<drug>]_vs_nAUC_heatmap.pdf
and[drug]_vs_nAUC_rel_heatmap.pdf
are heatmaps showing, for each strain in each concentration, the median and MAD (median absolute deviation)nAUC
andnAUC_rel
across replicates. Only spots that are 'valid for relative fitness and susceptibility calculations' were used. Note that there are also the plots with a.no_clustering.
tag, which have the strains sorted alphabetically. -
<drug>_susceptibility_heatmap_by_nAUC.pdf
is a heatmap showing, for each strain, the median and MADMIC_50
,SMG_MIC_50
andrAUC_concentration
across replicates. Only spots that are 'valid for relative fitness and susceptibility calculations' were used. Note that there are also the plots with a.no_clustering.
tag, which have the strains sorted alphabetically.
Beyond these outputs, Q-PHAST generates many other files that may be useful for some users, under the directory extended_outputs
:
The files fitness_measurements_simple.csv
, relative_fitness_measurements_simple.csv
and susceptibility_measurements_simple.csv
are the .csv versions (tab-separated) of the files fitness_measurements_simple.xlsx
, relative_fitness_measurements_simple.xlsx
and susceptibility_measurements_simple.xlsx
described above. You should use these .csv files for further calculations on these files, as parsing excel files may be dangerous.
The file growth_measurements_all_timepoints.csv
is a table with all growth measurements in all spots and timepoints. This is the raw data used for fitness measurements. These are the columns:
-
plate_batch
,plate
,row
,column
,strain
,drug
,concentration
are the spot information provided in the plate layout. -
Growth
has the inferred cell density. It is calculated asTrimmed / (Tile.Dimensions.X * Tile.Dimensions.Y * 255)
, as suggested in the qfa manual. -
Inoc.Time
is the innoculation time in YYYY-MM-DD_HH-MM-SS. -
Date.Time
is the timepoint in YYYY-MM-DD_HH-MM-SS, andExpt.Time
is the timepoint in days. -
Timeseries.order
is the categorical timepoint. -
X.Offset
andY.Offset
are the coordinates of the spot withing the plate. -
The remaining columns (i.e.
Area
orredMean
) are related to the growth inference. Check the qfa manual for more information.
The file fitness_measurements.csv
is a table with all the raw and relative fitness estimates per spot and drug concentration. These are the columns:
-
plate_batch
,plate
,row
,column
,spotID
,strain
,drug
,concentration
andexperiment_name
are the spot information provided in the plate layout. -
replicateID
indicates the spot asr<row>c<column>
(i.e. A1). Derived from this there is thesampleID
field, which includes<strain>_<replicateID>
. -
Many columns are the fitness estimates (i.e.
nAUC
,K
...) or relative fitness estimates (i.e.nAUC_rel
,K_rel
...) (defined above). -
Inoc.Time
,XOffset
,YOffset
are equivalent to those ingrowth_measurements_all_timepoints.csv
. -
bad_spot
indicates whether the spot is flagged as a 'bad spot', either from the input plate layout or by the automatic definition of bad spots. -
is_growing
indicates whether the spot is growing (it has annAUC
above the set 'min nAUC growing', see this). -
conc0_is_growing
andconc0_is_bad_spot
indicate whether the corresponding concentration==0 for a given spot is growing or is a bad spot (as defined above). -
idx_correct_rel_estimates
is a True/False boolean indicating whether a given spot is 'valid for relative fitness and susceptibility calculations' (defined above).
The file susceptibility_measurements.csv
is a table with all susceptibility estimates (by different fitness estimates). The columns are:
-
strain
,row
,column
,replicateID
,drug
areexperiment_name
the spot identifiers. -
MIC_25
,MIC_50
,MIC_75
andMIC_90
are theMIC
values at different values for<mic fraction>
. As mentioned above, this can be sometimesNaN
because of our quality control filtering. -
SMG_MIC_25
,SMG_MIC_50
,SMG_MIC_75
andSMG_MIC_90
are theSMG
values at different values for<mic fraction>
. As mentioned above, this can be sometimesNaN
because of our quality control filtering. -
rAUC_concentration
andrAUC_log2_concentration
are the differentrAUC
values. -
fitness_estimate
indicates the fitness estimate used to caluclate therAUC
,MIC
andSMG
values. These are always relative fitness estimates (i. e.nAUC_rel
,K_rel
). -
raw_fitness_conc0
is the raw fitness at concentration==0. -
max_concentration
is the maximum assayed concentration, relevant because it can affect susceptibility measurements.
In the folder summary_plots
within the main output directory there are many plots that provide an overview about the experiment (described above), based solely on nAUC
as the fitness estimate. Under extended_outputs
there are several equivalent plots for all the other fitness estimates:
-
drug_vs_fitness_lines_all_spots
contains figures equivalent tosummary_plots/<drug>/[<drug>]_vs_nAUC_lines_all.pdf
, but for all fitness estimates. -
drug_vs_fitness_lines
contains figures equivalent tosummary_plots/<drug>/[<drug>]_vs_nAUC_lines_only_correct.pdf
andsummary_plots/<drug>/[<drug>]_vs_nAUC_rel_lines_only_correct.pdf
, but for all fitness estimates. -
drug_vs_fitness_heatmaps
contains figures equivalent tosummary_plots/<drug>/[<drug>]_vs_nAUC_heatmap.pdf
andsummary_plots/<drug>/[<drug>]_vs_nAUC_rel_heatmap.pdf
, but for all fitness estimates. -
susceptibility_heatmaps
contains figures equivalent tosummary_plots/<drug>/<drug>_susceptibility_heatmap_by_nAUC.pdf
, but for all fitness estimates. -
susceptibility_heatmaps_log_scale
contains figures similar tosusceptibility_heatmaps
, but with log10(MIC_50) and rAUC_log2 values.
When no concentration==0 is provided, Q-PHAST generates various plots under the folder drug_vs_raw_fitness_heatmaps
. These show, for each strain in each drug, the median and MAD (median absolute deviation) fitness (for different fitness estimates) across replicates. These plots provide an overview about the experiment.
Q-PHAST generates various plots that are useful to assess the quality of the analysis. These are the folders under extended_outputs
that contain such plots:
-
growth_curves
contains the time - vs - Cell density curves for all spots in the experiment. These are generated by QFA. -
growth_curves_and_images
includes one plot for each strain and drug, useful for quality control. Each plot is a grid where the columns correspond to different concentrations. The first row shows the time - vs - Cell density curves for all replicates of the strain, showing with the linestyle the 'type spot' ('used' or 'discarded'). If concentration==0 was provided, 'discarded' means that the spot was not 'valid for relative fitness and susceptibility calculations' (defined above). If not, 'discarded' means that it is a 'bad spot'. The legend of these plots also show thenAUC
of each spot. The rows 2-4 show the images of the plates throughout the experiment. This representation is useful to understand why certain spots have a given growth curve.
Beyond the plots, there are several files that are useful for quality control of the analysis, as well as to reproduce the running:
-
plate_layout.xlsx
is a copy of the provided plate layout. -
bad_spots.xlsx
includes the spots defined as bad spots (either because you i) set them as such in the input plate layout or ii) accepted them as bad spots during the manual curation of flagged outliers). This file has the columnbad_spot_reason
which indicates which is the type of bad spot. In the case of automatically-inferred bad spots,bad_spot_reason
shows thenAUC
of the spot and the (Q1 - 2.5·IQR, Q3 + 2.5·IQR) range, which defined it as a possible outlier. To recall how bad spots are inferred see this page. -
reduced_input_dir.zip
includes the input plate layout, the run command and a subset of the input images. This file is useful for debugging errors. For instance, if you have errors you can send it to us to reproduce your errors and be able to fix them.