-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathintroduction.Rmd
39 lines (33 loc) · 5.4 KB
/
introduction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Introduction {#introduction}
<!-- First paragraph: DA analysis -->
One of the most basic questions we can ask about microbial communities is: How do different microbial taxa vary in abundance—across space, time, and host or environmental conditions?
Marker-gene and shotgun metagenomic sequencing (jointly, MGS) can be used to measure the abundances of thousands of species simultaneously, making it possible to ask this question on a community-wide scale.
In these _differential-abundance (DA) analyses_, the change in abundance of a microbial taxon across samples or conditions is used to infer ecological dynamics or find microbes that are associated with specific host diseases or environmental conditions.
Standard MGS measurements lose information about total microbial density and so are typically used to analyze the abundances of taxa relative to each other (_relative abundances_).
But new methods are increasingly used to provide absolute information, making it possible to analyze changes in absolute cell density, biomass, or genome copy number (_absolute abundances_).
In its various forms, DA is among the most common analyses applied to MGS data.
Unfortunately, these DA analyses are built on a fundamentally flawed foundation.
<!-- P: Taxonomic bias -->
MGS measurements are _taxonomically biased_: Microbial species vary dramatically in how efficiently they are measured—that is, converted from cells into taxonomically classified sequencing reads—by a given MGS protocol (@mclaren2019cons).
This bias arises from variation in how species respond to each step in an MGS protocol, from sample collection to bioinformatic classification.
Although often associated with features specific to marker-gene sequencing—the variation among species in marker copy numbers and in primer-binding and amplification efficiencies—the existence of large variation in DNA extraction efficiencies and in the ability to correctly classify reads make taxonomic bias a universal feature of both marker-gene and shotgun measurements.
As a result, MGS measurements provide inaccurate representations of actual community composition and tend to differ systematically across protocols, studies, and even experimental batches within a study (@yeh2018taxo, @mclaren2019cons).
These errors can supersede sizable biological differences (e.g. @lozupone2013meta) and may have contributed to failed replications of prominent DA results such as the associations of Bacteroides and Firmicutes in stool with obesity (@finucane2014atax) and the associations of species in the vaginas of pregnant women with preterm birth (@callahan2017repl).
<!-- P: Taxonomic bias and DA -->
The standard approach to countering taxonomic bias is to standardize the measurement protocol used within a given study.
Statistical analyses are then conducted with the (often tacit) assumption that all samples will be affected by bias in the same way and so the differences between samples will be unaffected.
This argument is at least intuitively plausible for DA analyses based on multiplicative or fold differences (FDs) in a taxon's abundance.
If bias caused a species' abundance to be consistently measured as 10-fold greater than its actual value, then we would still recover the correct FDs among samples.
<!-- (@kevorkian2018esti, @lloyd2020evid). -->
However, @mclaren2019cons showed mathematically and with MGS measurements of artificially constructed ('mock') communities that consistent taxonomic bias can create fold errors (FEs) that vary across samples and, as a result, majorly distort cross-sample comparisons.
In particular, they showed that the FE in a species' proportion---the most common measure of relative abundance---varies among samples, distorting the observed FDs between samples.
In some cases, bias can even lead to incorrect inferences about the direction of change (for example, by causing a taxon that decreased to appear to increase).
Yet @mclaren2019cons also found that other abundance measures---those based on the ratios among species---have constant FEs and may lead to more robust DA analyses.
The implications of these findings for DA analysis of absolute abundances and for the joint analysis of variation of many species across many samples, as is typical in microbiome association testing, have yet to be investigated.
<!-- P: Objective and summary of the present article -->
Here we use a combination of theoretical analysis, simulation, and re-analysis of published experiments to consider when and why taxonomic bias in MGS measurements leads to spurious results in DA analysis of relative and absolute abundances.
We show that, in contrast to received wisdom, taxonomic bias can affect the results of DA methods that are based on species proportions, even if bias is the same across samples.
We describe the theoretical conditions when this effect is negligible and when it may cause serious scientific errors, and explore this effect in case studies based on real microbiome experiments.
We further demonstrate that another set of DA methods, based on the ratios among species, are robust to consistent bias.
Finally, we present several methods for quantifying, correcting, or otherwise accounting for taxonomic bias in DA analyses which, in many cases, can be deployed with only modest changes to existing experimental and analytical workflows.
These insights and methods will aid microbiome data analysts in turning the findings of microbiome studies into readily-translatable scientific knowledge.