Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small fixes to the new bridge pipeline #3122

Merged
merged 2 commits into from
Nov 20, 2023
Merged

Conversation

gouttegd
Copy link
Collaborator

This PR amends the recently introduced SSSOM-based bridge pipeline to use the latest version (0.6.1) of the SSSOM plugin for ROBOT. This allows to:

  • make sure the mapping sets extracted from CL and ZFA have an ID and a license tag, as required by the SSSOM specification (the license is the same as the one used by the ontology those sets are derived from);
  • (much more importantly) make sure the the CL-to-ZFA bridge is complete.

Update to the latest version of the SSSOM plugin. This fixes a bug with
the mapping cardinality filter that caused a large part of the CL-ZFA
mappings to be ignored.

We also use the new features of that version to ensure that the
generated mapping sets have a license and an ID.
@gouttegd gouttegd self-assigned this Nov 18, 2023
@gouttegd
Copy link
Collaborator Author

gouttegd commented Nov 18, 2023

For the technically inclined, this (collapsed) comment explains why the CL-to-ZFA bridge is currently not complete (TL;DR: because of a bug in the SSSOM plugin, fixed in 0.6.1).

The SSSOM/T ruleset that generates the bridging axioms contains, at the beginning, two generic rules:

!(object==UBERON:* || object==CL:*) -> stop();
!cardinality==*:1 -> stop();

The first rule takes any mapping whose object is not a UBERON or CL term and excludes it from any further processing (in effect, it’s as if the mapping was simply discarded). This is necessary because we collect mappings from different sources and a particular mapping set may contain more than just mappings to UBERON or CL. For example, the ZFA mapping set contains mappings between ZFA and CL (that we want) but also mappings between ZFA and TAO (that we don’t care about). This rule eliminates the mappings with anything else than UBERON or CL, so that we don’t have to worry about their presence in the rest of the process.

The second rule excludes any mapping that does not have a cardinality of 1:1 (one foreign subject mapped to one UBERON/CL object) or n:1 (several foreign subjects mapped to one UBERON/CL object). That is, it eliminates 1:n mappings (one foreign subject mapped to several UBERON/CL objects) and n:n mappings. This is necessary because such mappings (which have been found in the wild, see #3056) are almost certainly bogus: it’s perfectly normal for a UBERON term to be mapped with more than one foreign term (in fact that’s expected: a UBERON term will be mapped to one term from FBbt, to one term from ZFA, to one term from WBbt, etc.), but it’s not expected that a single foreign term (say, a FBbt term) should be mapped to more than one UBERON term. If that happens, it’s almost certainly a mapping error, and the rule above takes care of eliminating that case.

So far so good, but the problem is that, up to version 0.6.0 (included), the SSSOM plugin for ROBOT was calculating the cardinality of the mappings only once, just after collating all the input mapping sets into a single set and before applying any rule. To understand how it is a problem, consider the following mappings:

ZFA:0009005  semapv:crossSpeciesExactMatch  CL:0000017
ZFA:0009005  semapv:crossSpeciesExactMatch  TAO:0009005

When the cardinality is calculated, these mappings are identified as 1:n mappings: one subject (ZFA:0009005) mapped to many objects (CL:0000017, TAO:009005). But they are not bogus mappings: it’s perfectly normal, in a mapping set coming from ZFA, to have ZFA terms mapped to more than one object (again, one for each foreign ontology ZFA is mapping with). What is important is that the ZFA term is only mapped with one CL term.

When the first rule above is applied, the mapping with TAO is excluded, since it is a mapping with an object that is neither a UBERON term nor a CL term. But because cardinality is only calculated once, the remaining mapping with CL:0000017 is still considered to be a 1:n mapping, even though there is now no other mapping with the same subject! So when we reach the second rule, that mapping gets also excluded, since it does not (appear to) have the cardinality we expect.

And that is how many of the ZFA-CL mappings were ignored in the first version of the new pipeline! Sorry about that. (Though it was still better than the old pipeline, which ignored all the ZFA-CL mappings.)

The fix is simply to ensure that, when we need to filter on cardinality (when a SSSOM/T rule has a cardinality==... filter), the cardinality of each mapping is re-calculated if any mapping has been excluded by a previous rule. That’s what version 0.6.1 of the SSSOM plugin does.

Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, can't see anything suspicious!

@gouttegd
Copy link
Collaborator Author

Of course, all the suspicious stuff is hidden away in the code of the SSSOM plugin! 😈

@gouttegd gouttegd merged commit d1abffb into master Nov 20, 2023
1 check passed
@gouttegd gouttegd deleted the small-fixes-to-bridge-pipeline branch November 20, 2023 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants