Indexing a Turtle file fails (because of regex for end of statement not being matched in parsed chunk?) #1019
Replies: 4 comments 1 reply
-
Hi Lorenz, can you try And have you checked that |
Beta Was this translation helpful? Give feedback.
-
Hi Hannah, tried with your suggestion, same error. For completeness, here are the current indexer settings:
Checked in advance already with Jena
which terminates with no parse errors. I guess it's rather hard to reproduce my issue, I mean, would think it could happen that your Turtle parser fails because of some unfortunate splits? From source code I can see that the error is in a parallel parser, and I assume it tries to process chunks in parallel, maybe in my file the boundaries of a statement occur in different chunks? Here is also a sample of the data: <https://data.coypu.org/trade/tiva/origin_of_value_added_in_gross_exports/HRV_D36T39_FIN_D20_2018_9.0> a tiva:ExgrBsci;
coy:hasValue "9.0"^^xsd:decimal;
tiva:hasExport <https://data.coypu.org/trade/tiva/exports/HRV_D36T39_FIN_D20_2018_9.0>;
tiva:hasValueAddedOrigin <https://data.coypu.org/trade/tiva/value_added_origin/HRV_D36T39_FIN_D20_2018_9.0>;
coy:hasYear "2018"^^xsd:gYear. I hope your parser doesn't fail because of the dot char in the URI, e.g. |
Beta Was this translation helpful? Give feedback.
-
Hi Lorenz,
(The sample of your data looks unsuspicous in that way, but maybe there is somewhere else). Does your file use Linux style line breaks ( Note that this is a strict subset of the Turtle standard, most notably you can write your whole file into a single line and still be valid, but this wouldn't work anymore. If you just use a downloaded Docker container, I will set up a fix soon to expose this option, thanks for pointing this out. |
Beta Was this translation helpful? Give feedback.
-
Hi Johannes, thanks for the insights. I checked the file
I didn't generate this file from CSV, but I know who and it looks like it was a Windows user ...
and it turns out that is indeed DOS/Window CRLF
So I did a
and now the file can be parsed and indexed. Maybe we could document this somewhere, though I know it's rather hind to find good places. Maybe as a hint at a place for supported formats or something, such that the rare group of scientific Windows users and also all others are aware of this? Tnak you very much for the fast support! Good job @hannahbast and @joka921 ! |
Beta Was this translation helpful? Give feedback.
-
Hi Qlever folks,
before raising an issue I'm asking here because I might be doing something wrong ...
Setup:
Qlever config file does nothing more than a
cat
of the uncompressed file and using a batch size of 500kOutput of
qlever index
isI get the impression that the parser does chunk (100MB) read the file, and I got unlucky with a split end which hasn't closed the last Turtle statement? If so, that would be rather inconvenient for users, right? If that's not the reason for my problem, then I'm stuck atm.
Any suggestions or hints how to proceed? Anything else I could provide here?
Thanks in advance.
Cheers,
Lorenz
Beta Was this translation helpful? Give feedback.
All reactions