Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance for queries with FROM that include GRAPH clause. #1753

Open
aindlq opened this issue Feb 4, 2025 · 1 comment
Open

Slow performance for queries with FROM that include GRAPH clause. #1753

aindlq opened this issue Feb 4, 2025 · 1 comment

Comments

@aindlq
Copy link

aindlq commented Feb 4, 2025

We are using RDF4J Java API to interact with QLever SPARQL endpoint. It has an API to retrieve all statements from one or multiple named graphs. At the moment it generates a query that is a valid but, in my opinion, slightly meaningless SPARQL query. See eclipse-rdf4j/rdf4j#5245

So for java call con.getStatements(null, null, null, ctx), where ctx is http://www.researchspace.org/resource/vocab/status. It generates:

SELECT * WHERE {  ?s ?p ?o . OPTIONAL { GRAPH ?ctx { ?s ?p ?o } } }

and then sets a default-graph-uri query parameter, which is equivalent to:

SELECT *
FROM <http://www.researchspace.org/resource/vocab/status>
WHERE {  ?s ?p ?o . OPTIONAL { GRAPH ?ctx { ?s ?p ?o } } }

For such query qlever is doing a full index scan:

Image

My understanding is that when FROM is specified in a query, but FROM NAMED is not specified, then GRAPH clause is essentially no-op.

https://www.w3.org/TR/sparql11-query/#unnamedGraph

Each FROM clause contains an IRI that indicates a graph to be used to form the default graph. This does not put the graph in as a named graph.

@aindlq aindlq marked this as a duplicate of #1754 Feb 4, 2025
@aindlq
Copy link
Author

aindlq commented Feb 4, 2025

It can be reproduced with any data, that query plan that I've attached is for the dataset that doesn't have the graph that was specified in FROM clause. When I try to execute the same query on your wikidata endpoint - https://qlever.cs.uni-freiburg.de/wikidata/h6dQ4E

I get:

Tried to allocate 162.2 GB, but only 20 GB were available. Clear the cache or allow more memory for QLever during startup

Actually the same issue is when only FROM NAMED is specified like:

SELECT *
FROM NAMED <http://www.researchspace.org/resource/vocab/status>
WHERE {  ?s ?p ?o . OPTIONAL { GRAPH ?ctx { ?s ?p ?o } } }

According to standard:

If there is no FROM clause, but there is one or more FROM NAMED clauses, then the dataset includes an empty graph for the default graph.

But qlever is scanning the whole index:

Image

which result into:

Tried to allocate 162.2 GB, but only 20 GB were available. Clear the cache or allow more memory for QLever during startup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant