In-silico peptide digestion by trypsin
Peptide digestion can be done in one line of code in R using functions from the stringr package
str_subset (as.character(str_split_fixed(peptide,'(K|R)(?!P)',Inf)),'[:alpha:]')
Where peptide is the input amino acid sequence The rule here is to cut whenever arginine (R) or lysine (K) are encountered, except when they are followed by proline (P)
If we want to do mass spec targetting specific peptides, we will need to include flanking sequences in both directions until the first arginine or lysine. For example,
- If we are interested in finding this peptide ANVGAGRHGLYKPE,
- and the peptide is part of a bigger protein ALLAMKYTNQANVGAGRHGLYKPEQLQAIREFN
- Tryspin will digest it into 4 smaller peptides: ALLAM - YTNQANVGAG - HGLYKPEQLQAI - EFN
- The peptide of interest is now in the two middle sequences, so we keep these, and delete the other sequences.
I wrote a function trypsin_flank
for a project involving parallel reaction monitoring (PRM), which is a type of mass spectrometry that is targeted to peptides of specific masses.
source ([raw url trypsin.R])
trypsin_flank (aa_df)
- amino acid sequence of peptide of interest (aa),
- longer protein sequence containing the peptide of interest (longer_seq)
- r2r is aa plus right and left flanking sequences until R, K, or end of longer_seq
- r2r_split is r2r as digested by trypsin