Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize constants, especially bind (?lang as "en") for example, combined with "filter (lang(?label)=?lang)" #1673

Open
TomT0m opened this issue Dec 11, 2024 · 2 comments

Comments

@TomT0m
Copy link

TomT0m commented Dec 11, 2024

It is sometimes used a pattern to use a bind to simulate a constant, for example see https://en.wikibooks.org/wiki/SPARQL/Templates on a Wikidata feature (even if the Wikidata feature is a bit of scope here as only items can be bound)

It can also be used to allow to change easily the language we want the labels for in a query

Select ?item ?itemLabel ?type ?typeLabel {
     bind ("en" as ?lang)
   ?item rdfs:label ?itemLabel ;
              wdt:P31 ?type . 
   ?type rdfs:label ?typeLabel filter (lang (?typeLabel) = ?lang).

   filter (lang (?langLabel) = ?langLabel).
}

You can change the language of the labels by just changing the bound values. The issue here is that qlever does not propagates the constant and do not optimize the language filters at all, this makes the pattern unusable in some cases.

Could some sort of constant propagation be implemented in the query analysis to support this ?

A follow up question will be posted as first comment because related but is a separate discussion.

@TomT0m
Copy link
Author

TomT0m commented Dec 11, 2024

Follow up idea similarly, for a "values" with a few languages and the same logic, rewriting a query with

…
values ?lang { "de" "en" }
 ?item rdfs:label ?item_ filter (lang(?itemLabel) = ?lang)
…

as something like

…
{
   bind(?lang as "de")
    ?item rdfs:label ?itemLabel filter (lang(?itemLabel) = "de")
} union {
   bind(?lang as "en")
    ?item rdfs:label ?itemLabel filter (lang(?itemLabel) = "en")
}
…

could be a step to help implement what the Wikibase query service does to retrieve labels in several languages efficiently. It does not solves prioritizing chosing the labels however if we want only one language.

I don't think there is currently something to easily do this last part in pure SPARQL generally, however. A "coalesce" could do this if we name the variables differently

bind (coalesce(?itemLabelDe, ?itemLabelEn) as ?itemLabel)

but my proposition does not rewrite to create variables.

An alternative could be using a group function, like a custom "max" or "min" function which the order is selected

select ?item (min(?itemLabel)) {
 …
} group by ?itemLabel  ?lang order by asc/desc rank(?lang, "en", "de")

where "rank" is a function that returns the number of the first equal parameter ( 1 for rank("en", "en", "de"), 2 for rank("de", "en", "de") …)

Still quite cumbersome however, needs a group by. Just thoughts.

@aindlq
Copy link

aindlq commented Dec 13, 2024

+1 for this, we heavily use this approach in https://github.com/researchspace/researchspace.

We have many queries that are parametrized through VALUES clause. And even so query can be quite complex if open ended, VALUES clause significantly reduce search space. I guess people that are coming from blazegraph got used to this approach, because blazegraph always pushed down VALUES and kind of evaluated them first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants