You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are 16186 cases where a term has the proteoform notation on both the definition and the PRO-proteoform-std lines.
There are 944 non-obsolete cases where a term has the proteoform notation on only the definition line. Of these, 917 are at the organism-neutral level; these are expected to lack the PRO-proteoform-std line. The others are organism-seqgroup (27 cases). All the organism-seqgroup cases are somewhat odd because the "Example: UniProtKB..." is not intended to serve the same purpose as PRO-proteoform-std. Rather, it just indicates one example of what would be a child term from a more-specific taxon (similar to the cases we have for S. pombe vs S. pombe 972h-).
There are 89 cases where a term has the proteoform notation on only the PRO-proteoform-std line. Of these, 78 are organism-seqgroup terms for some HLA subtype--these all have very complicated proteoform notations. The other 11 are cases where there are two EXACT synonyms for PRO-proteoform-std, each using a different UniProtKB identfier, so it isn't clear which to use. I could possibly figure out a way to decide that isn't completely arbitrary.
While the number of PRO-proteoform-std only lines is relatively small, all the terms like that are quite important for immunology or disease (HLA and SARS-CoV-2).
We should review the rules currently in use for obtaining the sequence information displayed in the alignment.
The text was updated successfully, but these errors were encountered:
@Julie-Cowart here are the 11 cases. I think all are for SARS-CoV-2 proteins. The virus has two genomic polyproteins, each of which are processed to yield the indicated proteins. The two polyproteins have separate accessions in UniProtKB, so basically each of the cases listed can from from two accessions. Despite this, as mentioned, the sequences are identical so both are given. After the list is a sample stanza. Interestingly, there is no PRO-proteoform-std like sentence in the def line (probably because it was unclear how to do it).
While the number of PRO-proteoform-std only lines is relatively small, all the terms like that are quite important for immunology or disease (HLA and SARS-CoV-2).
We should review the rules currently in use for obtaining the sequence information displayed in the alignment.
The text was updated successfully, but these errors were encountered: