Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL UPDATE error, body limit exceeded. #1756

Open
aindlq opened this issue Feb 4, 2025 · 4 comments
Open

SPARQL UPDATE error, body limit exceeded. #1756

aindlq opened this issue Feb 4, 2025 · 4 comments

Comments

@aindlq
Copy link

aindlq commented Feb 4, 2025

Is there a way to increase request body limit for SPARQL UPDATE requests?

Qlever fails to process INSERT DATA request with ~4000 non prefixed statements.

To reproduce download https://gist.githubusercontent.com/aindlq/8b2588b613eddf8e8fe76900e98d863f/raw/ebfd4a20276bccb8c08bbc4f34389b9296688a69/input.sparql and then execute:

curl -X POST -H "Content-Type: application/x-www-form-urlencoded" -H "Authorization: Bearer yourToken" --data-urlencode "update@input.sparql" http://localhost:7020

error:

ERROR: body limit exceeded [beast.http:9 at /usr/include/boost/beast/http/impl/basic_parser.ipp:467:13 in function 'void boost::beast::http::basic_parser::finish_header(boost::beast::error_code&, std::true_type)']: body limit exceeded

@hannahbast
Copy link
Member

@aindlq We had exactly that problem two weeks ago and discussed solutions.

  1. Increasing the body limit is easy: HOW TO increase the request size limit #1762
  2. The next problem is the limit on the size of the command line; this can be solved using curl's @ feature, like in your example
  3. The next problem is the SPARQL parser: the grammar for parsing a list of triples or quads is recursive https://github.com/ad-freiburg/qlever/blob/master/src/parser/sparqlParser/generated/SparqlAutomatic.g4#L194-L198, hence each new triple is a recursive call in the code and the call stack overflows when there are very many triples
  4. Even what is solved (by changing the grammar or the parsing), the SPARQL parser is very slow for many triples

The solution we envisioned is to parse large sets of update triples using on of the parser we use for parsing the input. In particular, those are fast and can also parse in parallel, with a speed of 4 M triples / second and more.

@Qup42 Did I forget anything?

@aindlq
Copy link
Author

aindlq commented Feb 5, 2025

@hannahbast thanks! In our use case increase of request size should do the job for now.

@hannahbast
Copy link
Member

@aindlq OK, let me turn this into a proper PR with an option to make the request size configurable

@Qup42
Copy link
Member

Qup42 commented Feb 18, 2025

An addition to the curl options: The option --data-binary is the way to go for large payloads. Assuming the SPARQL update or Graph Store RDF payload is in a file foo, use curl --data-binary @foo ....

  • --data-urlencode only works for relatively small payloads (<200k lines, 30M)
  • -d/--data works for larger payloads (<1.8M lines, 323M)
  • --data-binary works for even larger payloads (though the file is loaded into memory, which limits the size). The difference between --data-binary and --data is that the latter strips out carriage returns, newlines and null bytes, while the former changes nothing.

joka921 pushed a commit that referenced this issue Feb 18, 2025
#1816)

In the original `SPARQL` grammar, the `tripleTemplate` rule is defined recursively. This leads to stack overflows in the ANTLR-v4 based parser for large inputs, in particular large `INSERT DATA` requests. This PR changes the definition of the `triplesTemplate` to an equivalent formulation that is not recursive. This massively improves the performance of parsing large triple payloads. With this change we can process the whole olympics dataset as a single `UPDATE` request (1.78M triples, 323M).

To test this, `curl` with `--data-binary` can be used. Prepare the Update to execute into a file `foo`. Execute the query with `curl --data-binary @foo ...`. [Some more info on the different curl options](#1756 (comment)).

| Size    | Time before (s) | Time after (s) |
| ------- | --------------- | -------------- |
| 1781625 | N/A             | 65            |
| 800000  | N/A             | 27             |
| 200000  | N/A             | 7.5            |
| 100000  | N/A             | 4.2            |
| 92000   | N/A             | 4.0            |
| 88000   | N/A             | 3.8            |
| 84000   | 69              | 3.7            |
| 75000   | 59              | 3.4            |
| 50000   | 13              | 2.6            |
| 10000   | 1.7             | 1.3            |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants