add appendix document as requested by f1000

CambridgeCentreForProteomics · Aug 2, 2023 · 45ce161 · 45ce161
1 parent 3aeafa6
commit 45ce161
Show file tree

Hide file tree

Showing 2 changed files with 95 additions and 0 deletions.
diff --git a/appendix.Rmd b/appendix.Rmd
@@ -0,0 +1,95 @@
+---
+title: 'Appendix for "A Bioconductor workflow for processing, evaluating, and interpreting
+  expression proteomics data"'
+subtitle: Charlotte Hutchings, Charlotte S. Dawson, Thomas Krueger, Kathryn S. Lilley, Lisa M. Breckels
+author: Cambridge Centre for Proteomics, Department of Biochemistry, University
+  of Cambridge, UK
+output: pdf_document
+bibliography: refs.bib
+---
+
+# Appendix
+
+This Appendix accompanies the paper ["A Bioconductor workflow for processing,
+evaluating, and interpreting expression proteomics
+data"](https://github.com/CambridgeCentreForProteomics/f1000_expression_proteomics/blob/main/workflow_expressions.pdf)
+by Hutchings et al, submitted to F1000Research in August 2023. Associated data can be found 
+on Zenodo at [http://doi.org/10.5281/zenodo.7837375](http://doi.org/10.5281/zenodo.7837375)
+and also in the Github repository https://github.com/CambridgeCentreForProteomics/f1000_expression_proteomics/.
+
+# Identification search with Proteome Discoverer
+
+The use-case data analyzed in this workflow was initially processed using
+[Proteome Discoverer](https://www.thermofisher.com/uk/en/home/industrial/mass-spectrometry/liquid-chromatography-mass-spectrometry-lc-ms/lc-ms-software/multi-omics-data-analysis/proteome-discoverer-software.html)
+version 2.5. Whilst much of the identification and quantification takes place
+out of sight of the user, Proteome Discoverer incorporates several user-defined
+search parameters which must be specified according to the sample preparation
+methods and MS instrumentation used. There is also the option to apply both
+basic and advanced data filtering parameters during the search. Users must be
+aware of these parameters as they will directly influence the data output and
+downstream processing.
+
+Whilst an in-depth discussion of identification searches is outside of the scope
+of this workflow, a few key parameters are discussed to put the data into
+context. During sample preparation, TMT-labelled cell pellets were combined and
+separated into 8 fractions using a Pierce High pH Reversed-Phase Peptide
+Fractionation Kit (Thermo Fisher Scientific). After being analyzed by MS, the 8
+resulting raw files were uploaded to Proteome Discoverer 2.5 and processed using
+a single processing and consensus workflow. LFQ supernatant fractions were each
+analyzed on a separate mass spectrometry run resulting in 6 raw files. These
+files were imported into Proteome Discoverer with each sample having its own
+independent processing step followed by a single multi-consensus step. All
+processing and consensus workflow templates are provided in the supplementary
+materials.
+
+For both TMT and LFQ workflows, SequestHT was selected as the search engine and
+trypsin specified as the enzyme used for proteolytic digestion. Since the
+digestion was carried out overnight with a 1:20 w/w ratio of trypsin:protein,
+digestion was expected to be complete and a low threshold of 2 missed cleavages
+was allowed. For MS analysis, a Fourier Transform orbitrap with a resolving
+power of 120,000 m/z was used as the mass analyzer for precursor ion mass, and a
+linear ion trap was used to measure fragment ion mass. This information
+determined the thresholds for precursor and fragment mass tolerances, two key
+parameters for the identification search. The precursor mass tolerance
+determines which mass range of peptide sequences are considered for each
+observed spectrum, whilst the fragment mass tolerance specifies how similar the
+observed and theoretical peptide fragment spectra should be for a match. If
+these tolerances are too narrow then the correct peptide sequence may be omitted
+and true positives are lost. However, if thresholds are set too wide then
+incorrect peptide sequences are considered and false positives arise. Based on
+the instrumentation used in this experiment, standard mass tolerances of 10 ppm
+and 0.5 Da were allowed for precursors and fragments, respectively. Given the
+intrinsic variability of LFQ between MS runs, RT alignment was used for the
+label-free samples with a 10-minute retention time window.
+
+In addition to the parameters based on the experimental protocol, we also
+applied some basic non-specific filtering. We only retained high confidence PSMs
+from the identification search. Such filtering is necessary because only a
+fraction of the PSMs outputted by any given search engine will be genuine
+matches, or true discoveries, whilst the remainder are incorrect false
+discoveries. To deal with this problem, PSM confidence level (high, medium or
+low) is determined via the Proteome Discoverer Percolator node [@Kll2007] which
+estimates each PSM's false discovery rate (FDR). The raw spectra are searched
+against the database of interest as well as a decoy database containing
+randomised peptide sequences, often generated by shuffling or reversing the
+original peptide sequences. False discovery rate is then defined as the
+proportion of total PSMs that are matched to the decoy database, and, therefore,
+are known false discoveries. This is done for all spectra and we considered a
+PSM to be of 'high confidence' if it had a false discovery rate <1 %, 'medium
+confidence' if <5 %, and 'low confidence' if the false discovery rate exceeded
+5 %. Only PSMs annotated as high confidence were kept.
+
+Whilst the basic filtering steps completed during this identification search
+could just have easily been carried out in R using the `SummarizedExperiment`
+and `QFeatures` infrastructure, applying them here saves time later on and
+reduces the burden of storing large data files. These steps are also relatively
+standard and non-specific so we do not need to assess the data prior to their
+implementation. However, Proteome Discoverer also provides the option to carry
+out more in-depth filtering through the use of parameters such as the SPS Mass
+Match %, co-isolation interference % and signal-to-noise thresholds. We advise
+against implementing such filtering at this stage since decisions regarding
+thresholds will likely be influenced by the quality of data output, as
+demonstrated later in this workflow. Instead, thresholds for the three
+aforementioned parameters were set to 0 during the identification search.
+
+## References
diff --git a/appendix.pdf b/appendix.pdf