Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow INSERT DATA #1683

Open
nicolano opened this issue Dec 16, 2024 · 6 comments
Open

Slow INSERT DATA #1683

nicolano opened this issue Dec 16, 2024 · 6 comments

Comments

@nicolano
Copy link

I use the following query when inserting data into a graph

INSERT DATA {
  GRAPH <http://example.com> {
     ...Triples...
  } 
}

Depending on the size of the database, this can be very slow. For a SPARQL endpoint with the OSM dataset for Bremen (~30Million triples) it takes about 70ms per triple. Is this behaviour to be expected with the current version of QLever or is there a bug here (is the query I am using possibly not optimal)?

@hannahbast
Copy link
Member

hannahbast commented Dec 16, 2024

@nicolano Can you please post a link to the full dataset and the query? A local link to our file system is also fine if that is easier for you. Here is the expected performance for the current version of the code: https://github.com/ad-freiburg/qlever/wiki/First-tests-with-SPARQL-1.1-Update . That's 2 µs / triple, not 70 ms / triple.

@nicolano
Copy link
Author

I think you can use any dataset here, as this happened with every dataset I used.
You can see an example query with 1024 triples in the attached file.

For a freshly started QLever endpoint, I get the expected results. The slow queries only occur

  • when I use large batches (like in the example above with 1024 triples) for update queries
  • gradually, when I use one triple per update query, as the insertion time increases with each query processed (e.g. it starts with 1ms processing time per triple and goes up to 10ms per triple after 10,000 triples inserted).

@hannahbast
Copy link
Member

hannahbast commented Dec 18, 2024

@nicolano I just tested it with your example update on OSM Switzerland (0.7 B triples) and it took 0.1 s.

If you have a use case that is slow, it would be good to specify it exactly, so that we can try whether we can reproduce it.

@Qup42
Copy link
Member

Qup42 commented Jan 7, 2025

I am unable to get OLU to run on an OSM dataset for Bremen. The TTL extracts at https://osm2rdf.cs.uni-freiburg.de are only available for countries. Converting the snapshots from Geofabrik (both external and internal) with osm2rdf (adfreiburg/osm2rdf:latest as of today) and --add-way-node-order results in an error when running OLU.

No such node (sparql.results.result.binding.literal)
Could not fetch latest timestamp of any node from sparql endpoint

Reproduction setup

The tests that I conducted are using the dataset for switzerland as of today or 2024-12-17. The QLever instances were running with increased timeouts and memory allocation (Qleverfile). I decreased MAX_VALUES_PER_QUERY to 32 (default 1024) in OLU.

Results

Running it on 2024-12-17 the deletion step was slow. The first batch of 32 values resulted in ~30 mio deletions and took ~94s. The number of deletion is high (1/30 of the total ~900 mio triples). The duration per triple is comparable to the 2us measured by Hannah.

@nicolano
Copy link
Author

nicolano commented Jan 7, 2025

@Qup42 Could you please check the issue i have opened for your problem with the bremen dataset?

I try to specify my use case exactly:

qlever index 
qlever start
  • I use the following command to run OLU on this instance:
docker run --rm -v `pwd` -it olu http://host.docker.internal:7025/api/osm-planet -f http://download.geofabrik.de/europe/andorra-updates -a and

OLU inserts 320804 triples in batches of 1024 triples to the qlever instance, which takes 435321 ms (processing time taken from the qlever server log), or about 1.4ms per triple.

@nicolano
Copy link
Author

nicolano commented Jan 7, 2025

Here you can find the QLever server log for reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants