-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some confusion about the URL to be used for NCBIGene in the prefix map #1431
Comments
Ubergraph just has what comes from the source ontologies, so I think the thing to do here is try to convince PCL that they should be using |
We previously used `Text.obo_to_curie()` to translate the URLs in the information content values we got back from UberGraph into CURIEs. This doesn't work for complex CURIEs such as NCBIGene:4522, which is `https://identifiers.org/ncbigene/4522` (see TranslatorSRI/NodeNormalization#182). This PR provides a warning and an immediate solution, but #226 covers a more long-term tracking option. This PR uses the [curies](https://github.com/cthoyt/curies) package to translate URLs into CURIEs for us to use. There will still be issues (where Biolink Model and Ubergraph disagree on HTTP vs HTTPS URLs, see biolink/biolink-model#1431), but this should help improve our information content coverage. Since this code needs `get_config()`, which set up a circular dependency, I also moved `get_config()` into node.py. Some of these identifiers can't be mapped to CURIEs using the Biolink model -- in most cases this is because UberGraph covers concepts that we don't have in Babel (such as the Hymenoptera Anatomy Ontology), while in others it's because of slight differences in URLs, such as http-vs-https (biolink/biolink-model#1431). Tracking and fixing this in the long term is covered by #226.
Yes, in fact identifiers.org has over the years given 2x2 options (http vs https, slash vs hash). From a web browser POV it doesn't matter, they all resolve. But for semantic URIs this indecision destroys interoperability between triplestores. as http-with-slash was the first that is what many groups adopted first, so we should stick to that. If we do make a change it should be something with absolute guaranteed cast iron permanence. |
I followed up with the CL folks, and they had just had a new release with this change merged in. I am going to call this Biolink issue closed as I imagine @balhoff's ubergraph will be updated automatically, and that will trigger a fix for you, @gaurav -- please of course reopen if I missed a component. |
The NCBIGene concept IRI prefix in the Biolink Model prefix map is
http://identifiers.org/ncbigene/
:biolink-model/prefix-map/biolink-model-prefix-map.json
Line 121 in ce4f709
However, Ubergraph thinks the concept IRI prefix should actually be
https://identifiers.org/ncbigene/
(i.e.https
instead ofhttp
), which comes from the Provisional Cell Ontology (PCL), while identifiers.org thinks it should behttps://identifiers.org/ncbigene:
orhttps://www.ncbi.nlm.nih.gov/gene/
(https://registry.identifiers.org/registry/ncbigene).I would propose that Biolink Model goes with
https://identifiers.org/ncbigene/
to keep us in sync with Ubergraph, but I don't know if these identifier.org concept need to be rethought at some point.@balhoff Any thoughts on this?
The text was updated successfully, but these errors were encountered: