Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some confusion about the URL to be used for NCBIGene in the prefix map #1431

Closed
gaurav opened this issue Dec 8, 2023 · 3 comments
Closed
Labels
identifiers Used to group tickets around prefix management and identifier mappings next release candidate

Comments

@gaurav
Copy link
Contributor

gaurav commented Dec 8, 2023

The NCBIGene concept IRI prefix in the Biolink Model prefix map is http://identifiers.org/ncbigene/:

"NCBIGene": "http://identifiers.org/ncbigene/",

However, Ubergraph thinks the concept IRI prefix should actually be https://identifiers.org/ncbigene/ (i.e. https instead of http), which comes from the Provisional Cell Ontology (PCL), while identifiers.org thinks it should be https://identifiers.org/ncbigene: or https://www.ncbi.nlm.nih.gov/gene/ (https://registry.identifiers.org/registry/ncbigene).

I would propose that Biolink Model goes with https://identifiers.org/ncbigene/ to keep us in sync with Ubergraph, but I don't know if these identifier.org concept need to be rethought at some point.

@balhoff Any thoughts on this?

@balhoff
Copy link
Contributor

balhoff commented Dec 8, 2023

Ubergraph just has what comes from the source ontologies, so I think the thing to do here is try to convince PCL that they should be using http instead of https for identifiers. We (@sierra-moxon really) made some changes like this to align things in GO-CAM recently.

@nlharris nlharris added the identifiers Used to group tickets around prefix management and identifier mappings label Dec 8, 2023
gaurav added a commit to TranslatorSRI/Babel that referenced this issue Jan 23, 2024
We previously used `Text.obo_to_curie()` to translate the URLs in the information content values we got back from UberGraph into CURIEs. This doesn't work for complex CURIEs such as NCBIGene:4522, which is `https://identifiers.org/ncbigene/4522` (see TranslatorSRI/NodeNormalization#182). This PR provides a warning and an immediate solution, but #226 covers a more long-term tracking option.

This PR uses the [curies](https://github.com/cthoyt/curies) package to translate URLs into CURIEs for us to use. There will still be issues (where Biolink Model and Ubergraph disagree on HTTP vs HTTPS URLs, see biolink/biolink-model#1431), but this should help improve our information content coverage. Since this code needs `get_config()`, which set up a circular dependency, I also moved `get_config()` into node.py.

Some of these identifiers can't be mapped to CURIEs using the Biolink model -- in most cases this is because UberGraph covers concepts that we don't have in Babel (such as the Hymenoptera Anatomy Ontology), while in others it's because of slight differences in URLs, such as http-vs-https (biolink/biolink-model#1431). Tracking and fixing this in the long term is covered by #226.
@cmungall
Copy link
Collaborator

Yes, in fact identifiers.org has over the years given 2x2 options (http vs https, slash vs hash). From a web browser POV it doesn't matter, they all resolve. But for semantic URIs this indecision destroys interoperability between triplestores.

as http-with-slash was the first that is what many groups adopted first, so we should stick to that. If we do make a change it should be something with absolute guaranteed cast iron permanence.

@sierra-moxon
Copy link
Member

I followed up with the CL folks, and they had just had a new release with this change merged in. I am going to call this Biolink issue closed as I imagine @balhoff's ubergraph will be updated automatically, and that will trigger a fix for you, @gaurav -- please of course reopen if I missed a component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
identifiers Used to group tickets around prefix management and identifier mappings next release candidate
Projects
None yet
Development

No branches or pull requests

5 participants