BIND clause slows down some queries #3021

mpagni12 · 2025-02-19T15:44:13Z

Version

5.3.0

Question

Dear fuseki community,

I have observed several times with different queries, that the presence of a simple BIND clause can drastically slows down its execution.

For example, the following query

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX reconx: <https://reconx.vital-it.ch/kg/>
PREFIX mnx: <https://rdf.metanetx.org/schema/>

SELECT 
    ?mnet
    ?chem_1
    ?chem_1_label
    ?chem_2
    ?chem_2_label
WHERE{
    {
        SELECT DISTINCT ?mnet ?mnxm ?chem_1 ?chem_2
        WHERE{
            {
                SELECT DISTINCT ?mnet ?mnxm ?chem_1 ?chem_2
                WHERE{
                    BIND( reconx:vmh_Recon AS ?mnet ) # fix a model here to focus the results 
                    ?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
                    ?mnxm mnx:chemXref ?chem_1, ?chem_2
                    FILTER( STR( ?chem_1 ) < STR( ?chem_2 ))
                }  
            }
            ?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_2
        }
    }
    ?chem_1 reconx:label ?chem_1_label .
    ?chem_2 reconx:label ?chem_2_label .
}

takes at least half an hour to execute on my local fuseki instance.

But, if I remove the BIND clause from the most inner WHERE clause, by inlining the subject:

...
WHERE{
       # BIND( reconx:vmh_Recon AS ?mnet ) # fix a model here to focus the results
       reconx:vmh_Recon reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
       ?mnxm mnx:chemXref ?chem_1, ?chem_2
       FILTER( STR( ?chem_1 ) < STR( ?chem_2 ))
}
...

the query now executes in a couple of second!

Replacing the BIND clause with a VALUES clause also executes very slowly.

The same query executed on GraphDB populated with the same dataset takes a couple of second to execute, with no significant differences between the three variants (inline, BIND, VALUES)

I tend to prefer to use the syntax with the explicit BIND or VALUES clause, because in a complex query it permits to syntactically highlight the "input parameter". But currently, the price is too high. I wonder it has to do with the query optimiser.

This being reported, thanks a lot for maintaining fuseki which is a great open-source tool.

Marco

The text was updated successfully, but these errors were encountered:

afs · 2025-02-19T20:54:09Z

Hi @mpagni12

Looks like the optimizer is missing the best filter placement.

Could you try the following 2 patterns which add { } to hint to the optimizer:

                . . .
                WHERE{
                  {
                    BIND( reconx:vmh_Recon AS ?mnet ) # fix a model here to focus the results 
                    ?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
                  }
                    ?mnxm mnx:chemXref ?chem_1, ?chem_2
                    FILTER( STR( ?chem_1 ) < STR( ?chem_2 ))
                }  
                . . .

and also

                . . .
                WHERE{
                {
                    BIND( reconx:vmh_Recon AS ?mnet ) # fix a model here to focus the results 
                    ?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
                  {
                    ?mnxm mnx:chemXref ?chem_1, ?chem_2
                    FILTER( STR( ?chem_1 ) < STR( ?chem_2 ))
                  }
                }  
                . . .

Andy

mpagni12 · 2025-02-20T08:55:54Z

None of these help :-(

I have also tried, without success:

                 . . .
                WHERE{
                    {
                        SELECT ?mnet WHERE{ BIND( reconx:vmh_Recon AS ?mnet ) } # fix a model here to focus the results 
                    }
                    ?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
                    ?mnxm mnx:chemXref ?chem_1, ?chem_2
                    FILTER( STR( ?chem_1 ) < STR( ?chem_2 ))
                }  
                . . .

I wonder that the property path reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem is executed in the wrong direction, unless ?mnet is explicitely specified.

For information, in my dataset there are currently three instances for ?mnet and about 30'000 for ?chem_1

afs · 2025-02-20T09:38:15Z

You can unbundle the path (although the query execution does that anyway)

    ##?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
    ?mnet reconx:reac ?V1 .
    ?V1 reconx:equaSource ?V2 .
    ?V2 reconx:part ?V3 .
    ?V3  reconx:spec ?V4 .
    ?V4 reconx:chem  ?chem_1 .

Is the data publicly available?

Does the inner part:

                SELECT DISTINCT ?mnet ?mnxm ?chem_1 ?chem_2
                WHERE{
                    BIND( reconx:vmh_Recon AS ?mnet ) # fix a model here to focus the results 
                    ?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
                    ?mnxm mnx:chemXref ?chem_1, ?chem_2
                    FILTER( STR( ?chem_1 ) < STR( ?chem_2 ))
                }

behave the same way?

For information, in my dataset there are currently three instances for ?mnet and about 30'000 for ?chem_1

Is the data stored in TDB2?

mpagni12 · 2025-02-20T10:14:16Z

I have already attempted to unbundle the path, with no improvement in execution time. I can mention that the data structure behind the long property path is a dag, not a tree.

IMPORTANT: The isolated inner part executed as a stand-alone query is fast! Hence, the problem I report seems to be linked to the graph pattern being executed in a sub-query. This greatly clarify the problem I think.

The data are not yet officially released, but I can supply them to you by private email. However I guess that the problem should be easy to reproduce by introducing BIND clauses in the innermost graph pattern of nested sub-queries.

The data are stored in TDB2

afs · 2025-02-20T10:31:47Z

However I guess that the problem should be easy to reproduce by introducing BIND clause in the innermost graph pattern of nested sub-queries.

It's proving to be quite difficult to set up a simulation and be confident it illustrates the issue at your end. The optimizer plan doesn't look bad but there is a level below that which is more data shape sensitive. So the shape of the data appears to be a factor.

mpagni12 · 2025-02-20T11:05:39Z

Send me an email at Marco.Pagni@sib.swiss

afs · 2025-02-20T14:30:59Z

The data are not yet officially released, but I can supply them to you by private email.

Sorry, I can't work with private data. I try to treat all bug reports the same. If I did this for one user, it would suggest it could be done for other users.

mpagni12 · 2025-02-21T07:56:43Z

I understand, it makes sense. I tend to be very cautious by default, as I am often working with sensitive or unpublished data from other researchers. But it is not the case here.

Please find the dump of the dataset I have used in my above testing.

afs · 2025-02-21T09:44:02Z

Please find the dump of the dataset I have used in my above testing.

Got it! 66,860,461 triples.

afs · 2025-02-22T14:24:13Z

And there are 288 results?

mpagni12 · 2025-02-22T16:09:36Z

yes

afs · 2025-02-22T17:10:32Z

As a temporary workaround:

Removing ?mnet from the inner-most SELECT DISTINCT is faster. The set of unique ?mnxm ?chem_1 ?chem_2 is smaller. The query then finds ?mnet again.

The middle, next level out, SELECT DISTINCT gives the overall SELECT DISTINCT ?mnet ?mnxm ?chem_1 ?chem_2.

mpagni12 added the question label Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BIND clause slows down some queries #3021

BIND clause slows down some queries #3021

mpagni12 commented Feb 19, 2025

afs commented Feb 19, 2025

mpagni12 commented Feb 20, 2025 •

edited

Loading

afs commented Feb 20, 2025 •

edited

Loading

mpagni12 commented Feb 20, 2025 •

edited

Loading

afs commented Feb 20, 2025

mpagni12 commented Feb 20, 2025 •

edited

Loading

afs commented Feb 20, 2025

mpagni12 commented Feb 21, 2025

afs commented Feb 21, 2025

afs commented Feb 22, 2025

mpagni12 commented Feb 22, 2025

afs commented Feb 22, 2025 •

edited

Loading

BIND clause slows down some queries #3021

BIND clause slows down some queries #3021

Comments

mpagni12 commented Feb 19, 2025

Version

Question

afs commented Feb 19, 2025

mpagni12 commented Feb 20, 2025 • edited Loading

afs commented Feb 20, 2025 • edited Loading

mpagni12 commented Feb 20, 2025 • edited Loading

afs commented Feb 20, 2025

mpagni12 commented Feb 20, 2025 • edited Loading

afs commented Feb 20, 2025

mpagni12 commented Feb 21, 2025

afs commented Feb 21, 2025

afs commented Feb 22, 2025

mpagni12 commented Feb 22, 2025

afs commented Feb 22, 2025 • edited Loading

mpagni12 commented Feb 20, 2025 •

edited

Loading

afs commented Feb 20, 2025 •

edited

Loading

mpagni12 commented Feb 20, 2025 •

edited

Loading

mpagni12 commented Feb 20, 2025 •

edited

Loading

afs commented Feb 22, 2025 •

edited

Loading