-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need for a file with MOD mappings #257
Comments
So there are 3 parts to how this works. One is a mod.txt file that maps mod ids to a modification (p for phosphorylation, ac for acetylation, etc). The second is mapping these to css styles (colors). This currently supports modTypeList = ['p', 'ac', 'g', 'm', 'ub'] as well as sequence variants ('v') and unmodified ('un'). The rest will show as Other ('o'). Note there is a bug in the logic for how unmodified is applied that should be fixed because it seems to show as other despite the style for un being defined as white with grey border. The 3rd part is the legend that uses the following list (and uses the corresponding style to show the color block) var msaModTypeDict = {"mod-p":"Phosphorylation","mod-ac":"Acetylation","mod-g":"Glycosylation","mod-m":"Methylation","mod-ub":"Ubiquitination","mod-v":"Sequence Variant","mod-o":"Other"}; (@nataled) can review mod.txt for completeness and we can consider adding more modification types as different colors (to the code, style sheet, and legend). |
Here is my progress so far. First, the slims I've selected: MOD:90001 = lipoacylated MOD:90002 = peptide chain MOD:00033 = cross-linked MOD:00427 = methylated MOD:00649 = acylated MOD:00674 = amidated MOD:00675 = oxidized MOD:00677 = hydroxylated MOD:00696 = phosphorylated MOD:00701 = nucleic MOD:00703 = isoprenylated MOD:00764 = glycoconjugated MOD:01152 = carboxylated MOD:02078 = acetylated Note the made-up slims (MOD:9000x). MOD:90001 (lipoacylated) was made up to handle the large number of cases where the modification is considered both lipoconjugated and acylated, while MOD:90002 (peptide chain) combines things like ubiquitinated, sumoylated, etc. Next, I checked all the MOD terms mentioned in either PRO or Reactome (since PRO will eventually contain all of Reactome) to see the frequency of mention. Only the lines marked 'SLIM' are mapped to a slim; the others would appear on the alignment--at least if we adopt the slims above--as 'other': 13104 SLIM MOD:00764 glycoconjugated residue 7867 SLIM MOD:00696 phosphorylated residue 6690 SLIM MOD:00677 hydroxylated residue 1368 SLIM MOD:90002 peptide-linked 981 SLIM MOD:02078 acetylated residue 449 SLIM MOD:00427 methylated residue 448 SLIM MOD:01152 carboxylated residue 312 SLIM MOD:90001 lipoacylated 207 SLIM MOD:00033 crosslinked residues 196 SLIM MOD:00703 isoprenylated residue 136 SLIM MOD:00674 amidated residue 89 SLIM MOD:00649 acylated residue 88 SLIM MOD:00701 nucleotide or nucleic acid modified residue 85 MOD:00128 N6-pyridoxal phosphate-L-lysine 72 MOD:00130 L-allysine 45 SLIM MOD:00675 oxidized residue 32 MOD:00219 L-citrulline 20 MOD:00207 L-isoglutamyl-polyglutamic acid 19 MOD:00206 L-isoglutamyl-polyglycine 14 MOD:00314 glycine cholesterol ester 9 MOD:00159 O-phosphopantetheine-L-serine 8 MOD:01116 S-farnesyl-L-cysteine methyl ester 5 MOD:01119 S-geranylgeranyl-L-cysteine methyl ester 4 MOD:00317 N6-3,4-didehydroretinylidene-L-lysine 4 MOD:00685 deamidated L-glutamine 4 MOD:00912 modified L-lysine residue 4 MOD:01999 N6-(11-cis)-retinylidene-L-lysine 3 MOD:00031 L-selenocysteine residue 3 MOD:00274 L-cysteine persulfide 3 MOD:00909 modified L-histidine residue 2 MOD:00049 2'-[3-carboxamido-3-(trimethylammonio)propyl]-L-histidine 2 MOD:00125 hypusine 2 MOD:00181 O4'-sulfo-L-tyrosine 2 MOD:00237 L-beta-methylthioaspartic acid 2 MOD:01048 2-pyrrolidone-5-carboxylic acid 2 MOD:01699 protonated residue <----------------------------- 2 MOD:01777 S-(glycyl)-L-cysteine (Gly) 2 MOD:01786 3'-nitro-L-tyrosine 2 MOD:01880 L-deoxyhypusine 1 MOD:00129 N6-retinylidene-L-lysine 1 MOD:00908 modified glycine residue 1 MOD:00913 modified L-methionine residue 1 MOD:01623 1-thioglycine (C-terminal) 1 MOD:01625 1-thioglycine 1 MOD:01684 palmitoylated-L-cysteine The above numbers indicate to me that the selected slims are well-suited to the need, since all the non-slims (with the exception of 'protonated residue'; see arrow) are for specific amino acids and not of the general "something-ated" residue type. @karenross @Julie-Cowart do you agree with the selections? No need to worry about colors for now, beyond noting that we'll need 19 of them (the above 14 plus other, sequence variant, conserved site, conserved substitution, and one that's used to highlight the mouse-over'd position). |
I'm not sure what you mean by your selections. Are you proposing that the 15 MOD ids you listed as slims each have a seperate color? I think that is too many. Past 8 or so you run out of easy to distinguish colors. Based on counts it looks like we should add hydroxylated. Possibly carboxylated and lipoacylated too. Not sure about peptide linked. Is that instead of ubiquitinated? Is that a more useful categorization to users? |
Potentially, yes, if we adopt all the indicated selections they would need separate colors. The three you suggested are already on the list. I struggled with the peptide-linked one because it would indeed replace ubiquitinated. But I could not justify having ubiquitinated having its own color and not sumoylated, neddylated, and similar. Remember, these are just for color scheme. We'd have documentation indicating what is included in each color for those merged cases. |
Again, we need not worry about colors for now. There are ways of dealing with the need for too many colors. For example, selecting colors on the fly based on actual need for the entry (I'm sure we won't have any entries that require all colors simultaneously), or using a resource that lists optimal colors (for example https://sashamaps.net/docs/resources/20-colors/). |
I wouldn't do that. The colors should be consistent across views. E.g. Phosphorylation is always pink. But yes we can define more categories now and then decide to combine to the same color later. Or leave as other. So I'm starting from how it is implemented now and trying to figure out to get it to do what you are suggesting. For most of those categories it's just a matter of making sure the mod.txt is more complete with all the mod ids for all the leaves under those terms (like we missed some of phosphorylation ids that are the children of MOD:00696). But as I documented above first the mod id is converted to a prefix and then that is used for the colors. So I'm pretty sure ubiquitination should stay ub but if we did similar for all the other peptide chain versions and just happened to give then the same color in the style sheet then that would work. Then we change the label in the legend to say 'Peptide Linked' but the mouseover says ub for ubiquitinated. I'd need to think more about lipoacylated. I think multiple modifications on the same site just gets treated as other at the moment but if we want this to be a special case it might be possible. Definitely will need special handling in code because its coloring based on multiple modifications not just one like all the rest. By the way can you find an example of ubiquitinated (or other peptide linked for that matter) where the site is specified. I find many ubiquitinated forms but they don't have a specified site so don't show in the MSA. Same for finding a lipoacylated example. |
I intend to create a file that looks like mod.txt once we have the slims and prefixes decided. The mapping information (from MOD to MODslim is already done. Making mod.txt complete is why I'm doing the slims. It will list all the MOD identifiers (or the numeric part as is now done) that map to a slim (and therefore a color 'code'), though I'll leave out those that map to 'other' (I presume that what happens currently is that any unlisted identifier automatically becomes 'other'). Assigning the ub code to all the peptide/protein-linked cases is what I intended. The lipoacylated would be treated as a single modification because I already did the combining behind the scenes (in my case I created fake MOD IDs, but that's just as easily converted to a color code identifier). As far as finding the cases you mentioned, bear in mind that we might not have any in PRO yet, but they will come. That's why I included Reactome as a source when making the counts; the cases are there and will be imported into PRO in the next year or so. |
I missed the beginning of this discussion because I wasn't at last week's
meeting, but what are we planning to do about the "not" cases (e.g.,
"P12345 not phosphorylated on S100")? Are those going to get a color in the
alignment?
…On Fri, Sep 17, 2021 at 2:32 PM Darren A. Natale ***@***.***> wrote:
I intend to create a file that looks like mod.txt once we have the slims
and prefixes decided. The mapping information (from MOD to MODslim is
already done. Making mod.txt complete is why I'm doing the slims. It will
list all the MOD identifiers (or the numeric part as is now done) that map
to a slim (and therefore a color 'code'), though I'll leave out those that
map to 'other' (I presume that what happens currently is that any unlisted
identifier automatically becomes 'other').
Assigning the ub code to all the peptide/protein-linked cases is what I
intended. The lipoacylated would be treated as a single modification
because I already did the combining behind the scenes (in my case I created
fake MOD IDs, but that's just as easily converted to a color code
identifier).
As far as finding the cases you mentioned, bear in mind that we might not
have any in PRO yet, but they will come. That's why I included Reactome as
a source when making the counts; the cases are there and will be imported
into PRO in the next year or so.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#257 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMFDAAHXDIVAR5C3CMDLODTUCOCURANCNFSM5DZ32AQQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
That's covered here: #256 In short, there's a bug that's preventing these from rendering as they should. |
Here is a new ranked list that accounts only for those in Reactome plus the non-Reactome subset of PRO that is also position-specific (estimated as those that are organism-modification with PRO-proteoform-std). I removed the non-slim hits that are not generalized. 6232 SLIM MOD:00677 hydroxylated residue 4533 SLIM MOD:00696 phosphorylated residue 3956 SLIM MOD:00764 glycoconjugated residue 1109 SLIM MOD:90002 peptide-linked 437 SLIM MOD:01152 carboxylated residue 270 SLIM MOD:02078 acetylated residue 268 SLIM MOD:90001 lipoacylated 226 SLIM MOD:00427 methylated residue 205 SLIM MOD:00033 crosslinked residues 189 SLIM MOD:00703 isoprenylated residue 135 SLIM MOD:00674 amidated residue 48 SLIM MOD:00701 nucleotide or nucleic acid modified residue 43 SLIM MOD:00675 oxidized residue 20 SLIM MOD:00649 acylated residue 2 MOD:01699 protonated residue |
From PIR-PRO discussion: It is possible that the mapping between types of modifications and the color used to display them is incomplete. @Julie-Cowart will find the source of the mapping in the code. @nataled will attempt to create a mapping file, akin to how GO terms are mapped to slims.
The text was updated successfully, but these errors were encountered: