Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIND clause slows down some queries #3021

Open
mpagni12 opened this issue Feb 19, 2025 · 12 comments
Open

BIND clause slows down some queries #3021

mpagni12 opened this issue Feb 19, 2025 · 12 comments
Labels

Comments

@mpagni12
Copy link

Version

5.3.0

Question

Dear fuseki community,

I have observed several times with different queries, that the presence of a simple BIND clause can drastically slows down its execution.

For example, the following query

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX reconx: <https://reconx.vital-it.ch/kg/>
PREFIX mnx: <https://rdf.metanetx.org/schema/>

SELECT 
    ?mnet
    ?chem_1
    ?chem_1_label
    ?chem_2
    ?chem_2_label
WHERE{
    {
        SELECT DISTINCT ?mnet ?mnxm ?chem_1 ?chem_2
        WHERE{
            {
                SELECT DISTINCT ?mnet ?mnxm ?chem_1 ?chem_2
                WHERE{
                    BIND( reconx:vmh_Recon AS ?mnet ) # fix a model here to focus the results 
                    ?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
                    ?mnxm mnx:chemXref ?chem_1, ?chem_2
                    FILTER( STR( ?chem_1 ) < STR( ?chem_2 ))
                }  
            }
            ?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_2
        }
    }
    ?chem_1 reconx:label ?chem_1_label .
    ?chem_2 reconx:label ?chem_2_label .
}

takes at least half an hour to execute on my local fuseki instance.

But, if I remove the BIND clause from the most inner WHERE clause, by inlining the subject:

...
WHERE{
       # BIND( reconx:vmh_Recon AS ?mnet ) # fix a model here to focus the results
       reconx:vmh_Recon reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
       ?mnxm mnx:chemXref ?chem_1, ?chem_2
       FILTER( STR( ?chem_1 ) < STR( ?chem_2 ))
}
...

the query now executes in a couple of second!

Replacing the BIND clause with a VALUES clause also executes very slowly.

The same query executed on GraphDB populated with the same dataset takes a couple of second to execute, with no significant differences between the three variants (inline, BIND, VALUES)

I tend to prefer to use the syntax with the explicit BIND or VALUES clause, because in a complex query it permits to syntactically highlight the "input parameter". But currently, the price is too high. I wonder it has to do with the query optimiser.

This being reported, thanks a lot for maintaining fuseki which is a great open-source tool.

Marco

@afs
Copy link
Member

afs commented Feb 19, 2025

Hi @mpagni12

Looks like the optimizer is missing the best filter placement.

Could you try the following 2 patterns which add { } to hint to the optimizer:

                . . .
                WHERE{
                  {
                    BIND( reconx:vmh_Recon AS ?mnet ) # fix a model here to focus the results 
                    ?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
                  }
                    ?mnxm mnx:chemXref ?chem_1, ?chem_2
                    FILTER( STR( ?chem_1 ) < STR( ?chem_2 ))
                }  
                . . .

and also

                . . .
                WHERE{
                {
                    BIND( reconx:vmh_Recon AS ?mnet ) # fix a model here to focus the results 
                    ?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
                  {
                    ?mnxm mnx:chemXref ?chem_1, ?chem_2
                    FILTER( STR( ?chem_1 ) < STR( ?chem_2 ))
                  }
                }  
                . . .

Andy

@mpagni12
Copy link
Author

mpagni12 commented Feb 20, 2025

None of these help :-(

I have also tried, without success:

                 . . .
                WHERE{
                    {
                        SELECT ?mnet WHERE{ BIND( reconx:vmh_Recon AS ?mnet ) } # fix a model here to focus the results 
                    }
                    ?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
                    ?mnxm mnx:chemXref ?chem_1, ?chem_2
                    FILTER( STR( ?chem_1 ) < STR( ?chem_2 ))
                }  
                . . .

I wonder that the property path reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem is executed in the wrong direction, unless ?mnet is explicitely specified.

For information, in my dataset there are currently three instances for ?mnet and about 30'000 for ?chem_1

@afs
Copy link
Member

afs commented Feb 20, 2025

You can unbundle the path (although the query execution does that anyway)

    ##?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
    ?mnet reconx:reac ?V1 .
    ?V1 reconx:equaSource ?V2 .
    ?V2 reconx:part ?V3 .
    ?V3  reconx:spec ?V4 .
    ?V4 reconx:chem  ?chem_1 .

Is the data publicly available?

Does the inner part:

                SELECT DISTINCT ?mnet ?mnxm ?chem_1 ?chem_2
                WHERE{
                    BIND( reconx:vmh_Recon AS ?mnet ) # fix a model here to focus the results 
                    ?mnet reconx:reac/reconx:equaSource/reconx:part/reconx:spec/reconx:chem ?chem_1 .
                    ?mnxm mnx:chemXref ?chem_1, ?chem_2
                    FILTER( STR( ?chem_1 ) < STR( ?chem_2 ))
                } 

behave the same way?

For information, in my dataset there are currently three instances for ?mnet and about 30'000 for ?chem_1

Is the data stored in TDB2?

@mpagni12
Copy link
Author

mpagni12 commented Feb 20, 2025

I have already attempted to unbundle the path, with no improvement in execution time. I can mention that the data structure behind the long property path is a dag, not a tree.

IMPORTANT: The isolated inner part executed as a stand-alone query is fast! Hence, the problem I report seems to be linked to the graph pattern being executed in a sub-query. This greatly clarify the problem I think.

The data are not yet officially released, but I can supply them to you by private email. However I guess that the problem should be easy to reproduce by introducing BIND clauses in the innermost graph pattern of nested sub-queries.

The data are stored in TDB2

@afs
Copy link
Member

afs commented Feb 20, 2025

However I guess that the problem should be easy to reproduce by introducing BIND clause in the innermost graph pattern of nested sub-queries.

It's proving to be quite difficult to set up a simulation and be confident it illustrates the issue at your end. The optimizer plan doesn't look bad but there is a level below that which is more data shape sensitive. So the shape of the data appears to be a factor.

@mpagni12
Copy link
Author

mpagni12 commented Feb 20, 2025

Send me an email at Marco.Pagni@sib.swiss

@afs
Copy link
Member

afs commented Feb 20, 2025

The data are not yet officially released, but I can supply them to you by private email.

Sorry, I can't work with private data. I try to treat all bug reports the same. If I did this for one user, it would suggest it could be done for other users.

@mpagni12
Copy link
Author

I understand, it makes sense. I tend to be very cautious by default, as I am often working with sensitive or unpublished data from other researchers. But it is not the case here.

Please find the dump of the dataset I have used in my above testing.

@afs
Copy link
Member

afs commented Feb 21, 2025

Please find the dump of the dataset I have used in my above testing.

Got it! 66,860,461 triples.

@afs
Copy link
Member

afs commented Feb 22, 2025

And there are 288 results?

@mpagni12
Copy link
Author

yes

@afs
Copy link
Member

afs commented Feb 22, 2025

As a temporary workaround:

Removing ?mnet from the inner-most SELECT DISTINCT is faster. The set of unique ?mnxm ?chem_1 ?chem_2 is smaller. The query then finds ?mnet again.

The middle, next level out, SELECT DISTINCT gives the overall SELECT DISTINCT ?mnet ?mnxm ?chem_1 ?chem_2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants