Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gorule-0000027 must check that DBs are in the db-xref file and use the 'name' field #2210

Closed
8 of 12 tasks
pgaudet opened this issue Dec 15, 2023 · 12 comments
Closed
8 of 12 tasks
Assignees

Comments

@pgaudet
Copy link
Contributor

pgaudet commented Dec 15, 2023

gorule-0000027 states that all identifiers must be valid, but this is vague.

We will first check

Column 1 & 2: entity

  • DB (GAF column 1) value should be in the "database" field db-xref = GORULE:0000027 - TEST 1
  • database names or synonyms should be REPAIRED to value should be in the "database" field, and error reported = GORULE:0000027 - TEST 6 ==>> see question below
  • id_syntax (GAF column 2) should be checked, if defined in db-xref = GORULE:0000027 - TEST 7 REPORTED AS A WARNING, should be an ERROR, see comment
  • Column 1 and column 2 cardinality = 1; pipes are not allowed = GORULE:0000027 - TEST 8 REPORTED AS A ERROR

Column 8 "with" and Column 16 (extensions)

  • cardinality 0-n; values are pipe-separated = GORULE_TEST:0000027-9 (PASS), GORULE_TEST:0000027-10 (PASS)
  • values that contain DB:ID : same as column 1 & 2:
  • DB (GAF column 1) value should be in the "database" field db-xref = GORULE:0000027 - TEST 12, GORULE:0000027 - TEST 13
  • id_syntax (GAF column 2) should be checked, if defined in db-xref = GORULE:0000027 - TEST 14

References GAF - column 6, GPAD column 5

  • cardinality 1-n; values are pipe-separated GORULE:0000027 - TEST 15 (PASS), GORULE:0000027 - TEST 16
  • These have to be in db-xref and syntax has to be checked against syntax_id if possible. GORULE:0000027 - TEST 17**

Assigned_by field (GAF column 15; GPAD column 10)

  • checked against groups.yaml = **GORULE:0000027 - TEST 18 (PASS), GORULE:0000027 - TEST 19
    Not failing, see comment

LATER

  • errors should be FILTERED (to do later, once we are sure this is working correctly)
@kltm
Copy link
Member

kltm commented Dec 15, 2023

@pgaudet Looking at this and #1873 (comment), I think that we're pushing the bounds of what I would consider "low-hanging fruit"; there are some things here that require further discussion on the software side. I think we should roll these into a new project that is digging a little deeper in.

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 11, 2024

For id_syntax, @kltm asks that @mugitty does a test for speed, to make sure that this is not too intense computationally.

@pgaudet
Copy link
Contributor Author

pgaudet commented Jul 25, 2024

Correctly failing test

GORULE:0000027: SGDDB is not present in groups reference--`UniProtKB Q9HC96 CAPN10 involved_in GO:0006921 PMID:23072806 IDA P GORULE_TEST:0000027-2 Calpain-10 CAPN10,KIAA1845 protein taxon:9606 20140213 SGDDB

@pgaudet pgaudet closed this as completed Jul 25, 2024
@pgaudet pgaudet reopened this Jul 25, 2024
@pgaudet
Copy link
Contributor Author

pgaudet commented Jul 25, 2024

Need to add more tests to check all points mentioned in initial comment

@pgaudet
Copy link
Contributor Author

pgaudet commented Jul 25, 2024

Add test

  • Assigned_by field (GAF column 15; GPAD column 10) should be checked against groups.yaml >> GO-OWL is not allowed, for example (currently we have 42 annotations in AmiGO)
  • Added more tests for all cases in the initial comment
  • Noting that FB:FBrf0193169 is incorrectly picked up as an error by gorule-0000027 (FB refs were recently added this to the dbxef file, this should be fixed once the change is incorporated in @mugitty 's local version for testing)

@mugitty
Copy link
Contributor

mugitty commented Aug 29, 2024

@pgaudet , It is working on http://skyhook.berkeleybop.org/go-site-2210-gorule-0000027-id-syntax/reports/assigned-by-gorule-report.html

Update has to be tested on a build that is based on ontobio version 2.9.7 or greater. However, we have reverted to build 2.9.2 on Aug 28, 2024.

@kltm
Copy link
Member

kltm commented Aug 29, 2024

@mugitty You can continue with the newer (latest) ontobio. The temporary reversion was to see if we could get a snapshot through to support ontology and GAF (annotation) updates while we work on the ontobio fixes for release.

@mugitty
Copy link
Contributor

mugitty commented Aug 30, 2024

Thanks @kltm! The updates for this ticket are already merged into the latest ontobio release.

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 24, 2025

@mugitty

For GORULE-0000027, I have a question about the behavior of the code when encountering this issue: "Database names or synonyms should be REPAIRED to value should be in the "database" field, and error reported (this is tested by GORULE_TEST:0000027-6)

GORULE:0000027: WormBase not found in list of database names in dbxrefs--`WormBase WBGene00000002 aat-1 part_of GO:1990184 WB_REF:WBPaper00006408|PMID:14668347 IPI WB:WBGene00000225 C GORULE_TEST:0000027-6 F27C8.1 gene taxon:6239 20131022 GO_Central

ie, I expect 'WormBase' in Col 1 to be repaired to WB. Usually the report prints the message 'but was repaired'; here is doesn't. What is done, is it indeed repaired?
There are 2 possible cases: either the label used is a synonym of the dbxref, in which case it can be replaced and should give a WARNING, or it's not present in the file, and should be an ERROR.

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 24, 2025

For GORULE-0000027: id_syntax (GAF column 2)** should be checked, if defined in db-xref (this is tested by GORULE_TEST:0000027-7), and reported as a WARNING; it should be an ERROR.

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 24, 2025

For GORULE-0000027: this test is not failing:
! FAILS GORULE:0000027 - TEST 19 - 'Assigned by' is in dbxref.yaml but not in groups.yaml (PO)
UniProtKB Q6UW15 REG3G involved_in GO:0006401 PMID:16931762 IDA P GORULE_TEST:0000027-19 Regenerating islet-derived protein 3-gamma REG3G|PAP1B|UNQ429/PRO162 protein taxon:9606 20220606 PO

@pgaudet
Copy link
Contributor Author

pgaudet commented Feb 26, 2025

That last test is now working. Yay!

@pgaudet pgaudet closed this as completed Feb 26, 2025
@pgaudet pgaudet moved this from Clearing - needs testing to Done in GORULES (low-hanging fruit) Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants