Skip to content

Latest commit

 

History

History
168 lines (117 loc) · 6.27 KB

README.md

File metadata and controls

168 lines (117 loc) · 6.27 KB

QLever

Build Status

QLever (pronounced "clever") is an efficient SPARQL engine supporting large datasets including the full Wikidata (7 billion triples). Even on very large datasets QLever uses only about 40 GB RAM, builds indices in less than 12 hours and executes most queries in less than a second.

On top of the standard SPARQL functionality, QLever also supports SPARQL+Text search and SPARQL autocompletion; these are described on the advanced features page.

A demo of QLever on a variety of large datasets, including Wikidata, can be found here.

The basic design behind QLever was described in this CIKM'17 paper. If you use QLever in your work, please cite that paper.

Quickstart

If you want to skip the details and just get a running QLever instance to play around with. Follow the quickstart guide.

Alternatively to get started with a real (and really big) dataset we have prepared a Wikidata Quickstart Guide. This guide takes you through the entire process of loading the full Wikidata Knowledge Base into QLever, but don't worry it is pretty simple.

Overview

The rest of this page is organized in the following sections. Taking you through the steps necessary to get a QLever instance up and runnining starting from a simple Turtle dump of a Knowledge Base.

Further documentation is available on the following topics

Building the QLever Docker Container

We recommend using QLever with docker. If you absolutely want to run QLever directly on your host see here.

The installation requires a 64-bit system, docker version 18.05 or newer and git.

git clone --recursive https://github.com/ad-freiburg/QLever.git qlever
cd qlever
docker build -t qlever .

This creates a docker image named "qlever" which contains everything needed to use QLever. If you want to be sure that everything is working as it should before proceeding, you can run the end-to-end tests

Creating an Index

Obtaining Data

First make sure that you have your input data ready and accessible on your machine. If you have no input data yet obtain it from one of our recommended sources or create your own knowledge base in standard NTriple or Turtle formats and (optionally) add a text corpus.

Note that QLever only accepts UTF-8 encoded input files. Then again you should be using UTF-8 anyway

Permissions

By default and when running docker without user namespaces, the container will use the user ID 1000 which on Linux is almost always the first real user. If the default user does not work add -u "$(id -u):$(id -g)" to docker run so that QLever executes as the current user.

When running docker with user namespaces you may need to make the index folder accessible to the user the QLever process is mapped to on the host (e.g. nobody, see /etc/subuid)

chmod -R o+rw ./index

Building the Index

Then proceed with creating an index.

Important: Ensure that you have enough disk space where your ./index folder resides or see below for using a separate path

To build a new index run a bash inside the QLever container as follows

docker run -it --rm \
           -v "<absolute_path_to_input>:/input" \
           -v "$(pwd)/index:/index" --entrypoint "bash" qlever

If you want to use a separate path you MUST change the "$(pwd)/index part in all docker … commands and replace it with the absolute path to your index.

From now on we are inside the container, make sure you follow all the coming instructions for creating an index and only then proceed to the next section.

If your input knowledge base is in the standard NTriple or Turtle format create the index with the following command

IndexBuilderMain -l -i /index/<prefix> -f /input/knowledge_base.ttl

Where <prefix> is the base name for all index files and -l externalizes long literals to disk. If you use index as the prefix you can later skip the -e INDEX_PREFIX=<prefix> flag.

To include a text collection, the wordsfile and docsfiles (see here for the required format) is provided with the -w and -d flags respectively.

Then the full command will look like this:

IndexBuilderMain -l -i /index/<prefix> -f /input/knowledge_base.ttl \
  -w /input/wordsfile.tsv -d /input/docsfile.tsv

You can also add a text index to an existing knowledge base index by adding the -A flag and ommitting the -f flag.

Running QLever

To run a QLever server container use the following command.

docker run -it -p 7001:7001 \
  -v "$(pwd)/index:/index" \
  -e INDEX_PREFIX=<prefix> \
  --name qlever \
  qlever

Where additional arguments can be added at the end of the command. If you want the container to run in the background and restart automatically replace -it with -d --restart=unless-stopped

Executing queries

The quickest way to run queries is to use the minimal web interface, available at the port specified above (7001 in the example). For a more advanced web interface you can use the QLever UI.

Queries can also be executed from the command line using curl

curl 'http://localhost:7001/?query=SELECT ?x WHERE {?x <rel> ?y}'