-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add appendix document as requested by f1000
- Loading branch information
Showing
2 changed files
with
95 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
--- | ||
title: 'Appendix for "A Bioconductor workflow for processing, evaluating, and interpreting | ||
expression proteomics data"' | ||
subtitle: Charlotte Hutchings, Charlotte S. Dawson, Thomas Krueger, Kathryn S. Lilley, Lisa M. Breckels | ||
author: Cambridge Centre for Proteomics, Department of Biochemistry, University | ||
of Cambridge, UK | ||
output: pdf_document | ||
bibliography: refs.bib | ||
--- | ||
|
||
# Appendix | ||
|
||
This Appendix accompanies the paper ["A Bioconductor workflow for processing, | ||
evaluating, and interpreting expression proteomics | ||
data"](https://github.com/CambridgeCentreForProteomics/f1000_expression_proteomics/blob/main/workflow_expressions.pdf) | ||
by Hutchings et al, submitted to F1000Research in August 2023. Associated data can be found | ||
on Zenodo at [http://doi.org/10.5281/zenodo.7837375](http://doi.org/10.5281/zenodo.7837375) | ||
and also in the Github repository https://github.com/CambridgeCentreForProteomics/f1000_expression_proteomics/. | ||
|
||
# Identification search with Proteome Discoverer | ||
|
||
The use-case data analyzed in this workflow was initially processed using | ||
[Proteome Discoverer](https://www.thermofisher.com/uk/en/home/industrial/mass-spectrometry/liquid-chromatography-mass-spectrometry-lc-ms/lc-ms-software/multi-omics-data-analysis/proteome-discoverer-software.html) | ||
version 2.5. Whilst much of the identification and quantification takes place | ||
out of sight of the user, Proteome Discoverer incorporates several user-defined | ||
search parameters which must be specified according to the sample preparation | ||
methods and MS instrumentation used. There is also the option to apply both | ||
basic and advanced data filtering parameters during the search. Users must be | ||
aware of these parameters as they will directly influence the data output and | ||
downstream processing. | ||
|
||
Whilst an in-depth discussion of identification searches is outside of the scope | ||
of this workflow, a few key parameters are discussed to put the data into | ||
context. During sample preparation, TMT-labelled cell pellets were combined and | ||
separated into 8 fractions using a Pierce High pH Reversed-Phase Peptide | ||
Fractionation Kit (Thermo Fisher Scientific). After being analyzed by MS, the 8 | ||
resulting raw files were uploaded to Proteome Discoverer 2.5 and processed using | ||
a single processing and consensus workflow. LFQ supernatant fractions were each | ||
analyzed on a separate mass spectrometry run resulting in 6 raw files. These | ||
files were imported into Proteome Discoverer with each sample having its own | ||
independent processing step followed by a single multi-consensus step. All | ||
processing and consensus workflow templates are provided in the supplementary | ||
materials. | ||
|
||
For both TMT and LFQ workflows, SequestHT was selected as the search engine and | ||
trypsin specified as the enzyme used for proteolytic digestion. Since the | ||
digestion was carried out overnight with a 1:20 w/w ratio of trypsin:protein, | ||
digestion was expected to be complete and a low threshold of 2 missed cleavages | ||
was allowed. For MS analysis, a Fourier Transform orbitrap with a resolving | ||
power of 120,000 m/z was used as the mass analyzer for precursor ion mass, and a | ||
linear ion trap was used to measure fragment ion mass. This information | ||
determined the thresholds for precursor and fragment mass tolerances, two key | ||
parameters for the identification search. The precursor mass tolerance | ||
determines which mass range of peptide sequences are considered for each | ||
observed spectrum, whilst the fragment mass tolerance specifies how similar the | ||
observed and theoretical peptide fragment spectra should be for a match. If | ||
these tolerances are too narrow then the correct peptide sequence may be omitted | ||
and true positives are lost. However, if thresholds are set too wide then | ||
incorrect peptide sequences are considered and false positives arise. Based on | ||
the instrumentation used in this experiment, standard mass tolerances of 10 ppm | ||
and 0.5 Da were allowed for precursors and fragments, respectively. Given the | ||
intrinsic variability of LFQ between MS runs, RT alignment was used for the | ||
label-free samples with a 10-minute retention time window. | ||
|
||
In addition to the parameters based on the experimental protocol, we also | ||
applied some basic non-specific filtering. We only retained high confidence PSMs | ||
from the identification search. Such filtering is necessary because only a | ||
fraction of the PSMs outputted by any given search engine will be genuine | ||
matches, or true discoveries, whilst the remainder are incorrect false | ||
discoveries. To deal with this problem, PSM confidence level (high, medium or | ||
low) is determined via the Proteome Discoverer Percolator node [@Kll2007] which | ||
estimates each PSM's false discovery rate (FDR). The raw spectra are searched | ||
against the database of interest as well as a decoy database containing | ||
randomised peptide sequences, often generated by shuffling or reversing the | ||
original peptide sequences. False discovery rate is then defined as the | ||
proportion of total PSMs that are matched to the decoy database, and, therefore, | ||
are known false discoveries. This is done for all spectra and we considered a | ||
PSM to be of 'high confidence' if it had a false discovery rate <1 %, 'medium | ||
confidence' if <5 %, and 'low confidence' if the false discovery rate exceeded | ||
5 %. Only PSMs annotated as high confidence were kept. | ||
|
||
Whilst the basic filtering steps completed during this identification search | ||
could just have easily been carried out in R using the `SummarizedExperiment` | ||
and `QFeatures` infrastructure, applying them here saves time later on and | ||
reduces the burden of storing large data files. These steps are also relatively | ||
standard and non-specific so we do not need to assess the data prior to their | ||
implementation. However, Proteome Discoverer also provides the option to carry | ||
out more in-depth filtering through the use of parameters such as the SPS Mass | ||
Match %, co-isolation interference % and signal-to-noise thresholds. We advise | ||
against implementing such filtering at this stage since decisions regarding | ||
thresholds will likely be influenced by the quality of data output, as | ||
demonstrated later in this workflow. Instead, thresholds for the three | ||
aforementioned parameters were set to 0 during the identification search. | ||
|
||
## References |
Binary file not shown.