Skip to content

Commit

Permalink
RdfImporter now compiles with v.2.0 of libraries
Browse files Browse the repository at this point in the history
  • Loading branch information
mzattera committed May 2, 2021
1 parent 96fd136 commit 549279a
Show file tree
Hide file tree
Showing 5 changed files with 114 additions and 67 deletions.
Binary file modified Graql/RdfImporter.jar
Binary file not shown.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ The idea around KRAAL is quite simple. The provided Graql scripts define the bel

In addition, scripts create rules to represent semantic of predicates in a way that triples are inferred and materialized accordingly to existing facts.

Lastly some Java code is provided to import RDF triples from various file format into Grakn, using the above concepts.
Lastly, some Java code is provided to import RDF triples from various file format into Grakn, using the above concepts.

In the end, once that the above schema and rules are created, Grakn can be used as an RDF triple store (with triples stored as `rdf-triple` relations) with an integrated reasoner (provided by the semantic rules).
In the end, once the above schema and rules are imported, Grakn can be used as an RDF triple store (with triples stored as `rdf-triple` relations) with an integrated reasoner (provided by the semantic rules).


## Prerequisites
Expand Down Expand Up @@ -58,12 +58,12 @@ on the Grakn website to familiarize a bit with Grakn and its console.

## RdfImporter

This is a small Java utility to import RDF files into Grakn. It is provided as a `.jar` file under the `Graql` folder.
This is a small Java utility to import RDF files into Grakn. It is provided as a `RdfImporter.jar` under the `Graql` folder.
Notice that, in order to use the tool, you must have created the required RDF support (that is schema and rules)
as explained below.
as explained in next section below.

```
java -jar RdfImport.jar io.github.mzattera.semanticweb.kraal.RdfImporter -k <arg> -f <arg> [-u <arg>] [-s <arg>] file1 [file2] ...
java -jar RdfImporter.jar io.github.mzattera.semanticweb.kraal.RdfImporter -k <arg> -f <arg> [-u <arg>] [-s <arg>] file1 [file2] ...
-f <arg> Format of input file.
-k <arg> Key space to use for importing.
Expand Down Expand Up @@ -120,11 +120,11 @@ named `rdf`and import the schema there. Please refer to Grankn console docmentat
Transaction changes committed
```
2. Then you must import RDF and RDF Schema vocabularies using `RdfImport` import utility as shown below.
2. Then you must import RDF and RDF Schema vocabularies into the `rdf` database using `RdfImport` import utility as shown below.
In this example we assume `.jar` and vocabularies are in the same folder from where you
run the command.

```java -jar RdfImport.jar io.github.mzattera.semanticweb.kraal.RdfImporter -k <keyspace> -f TTL 22-rdf-syntax-ns.ttl rdf-schema.ttl```
```java -jar RdfImporter.jar io.github.mzattera.semanticweb.kraal.RdfImporter -k rdf -f TTL 22-rdf-syntax-ns.ttl rdf-schema.ttl```


## Graql Editor
Expand Down
2 changes: 1 addition & 1 deletion eclipse/kraal/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
<dependency>
<groupId>io.grakn.client</groupId>
<artifactId>grakn-client</artifactId>
<version>${grkn-client-version}</version>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.eclipse.rdf4j</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
package io.github.mzattera.semanticweb.kraal;

import java.io.Closeable;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
Expand All @@ -13,6 +14,7 @@
import java.util.List;
import java.util.Optional;
import java.util.Set;
import java.util.stream.Stream;

import org.apache.commons.cli.CommandLine;
import org.apache.commons.cli.DefaultParser;
Expand All @@ -27,14 +29,16 @@
import org.eclipse.rdf4j.rio.Rio;
import org.eclipse.rdf4j.rio.helpers.StatementCollector;

import grakn.client.GraknClient;
import grakn.client.GraknClient.Session;
import grakn.client.GraknClient.Transaction;
import grakn.client.GraknClient.Transaction.QueryOptions;
import grakn.client.answer.ConceptMap;
import grakn.client.Grakn;
import grakn.client.api.GraknClient;
import grakn.client.api.GraknOptions;
import grakn.client.api.GraknSession;
import grakn.client.api.GraknTransaction;
import grakn.client.api.answer.ConceptMap;
import graql.lang.Graql;
import graql.lang.pattern.variable.ThingVariable.Attribute;
import graql.lang.query.GraqlCompute.Path;
import graql.lang.query.GraqlInsert;
import graql.lang.statement.StatementInstance;
import io.github.mzattera.semanticweb.util.Utils;

/**
Expand All @@ -51,48 +55,36 @@ public final class RdfImporter implements Closeable {
// We insert at maximum this number of triplets before performing a commit()
private static final int DEFAULT_TRIPLES_PER_TRANSACTION = 1500;

private static final QueryOptions QUERY_OPTIONS = Transaction.Options.infer(false).explain(false).batchSize(1);
// TODO check if it is OK to get core.
private static final GraknOptions QUERY_OPTIONS = GraknOptions.core().infer(false).explain(false).batchSize(1);

// Client connected to the host
private final GraknClient client;

// Opened session to the host
private final GraknClient.Session session;
private final GraknSession session;

// Resources being created, so we can create them only once.
private final Set<String> createdResources = new HashSet<>();

/**
* Creates a client to default (local) host.
*
* @param keySpace The Key space to connect to.
* @param db The database to connect to.
*/
public RdfImporter(String keySpace) {
this(GraknClient.DEFAULT_URI, keySpace);
public RdfImporter(String db) {
this(Grakn.DEFAULT_ADDRESS, db);
}

/**
* Creates a client to given host (and port).
*
* @param host The Grakn host:port to connect to.
* @param keySpace The Key space to connect to.
* @param host The Grakn host:port to connect to.
* @param db The database to connect to.
*/
public RdfImporter(String host, String keySpace) {
client = new GraknClient(host);
session = client.session(keySpace);
}

/**
* Creates a client to given host (and port) using given credentials.
*
* @param host The Grakn host:port to connect to.
* @param keySpace The Key space to connect to.
* @param username User Name to use for authentication.
* @param password Password to use for authentication.
*/
public RdfImporter(String host, String keySpace, String username, String password) {
client = new GraknClient(host, username, password);
session = client.session(keySpace);
public RdfImporter(String host, String db) {
client = Grakn.coreClient(host);
session = client.session(db, GraknSession.Type.DATA);
}

/**
Expand All @@ -118,6 +110,9 @@ public void importFile(String fileName, RDFFormat format, String baseUri, int ba
List<Statement> statements = new ArrayList<>();
rdfParser.setRDFHandler(new StatementCollector(statements));

// TODO Encoding?
File in = new File(fileName);
if (!in.canRead()) throw new FileNotFoundException("Cannot access file: " + in.getCanonicalPath());
try (InputStream is = new FileInputStream(fileName)) {
rdfParser.parse(is, baseUri);
} // close input stream
Expand All @@ -126,7 +121,7 @@ public void importFile(String fileName, RDFFormat format, String baseUri, int ba
int tot = 0;

// TODO can we reuse transaction? probably so....
Transaction writeTransaction = commitAndReopenTransaction(session, null);
GraknTransaction writeTransaction = commitAndReopenTransaction(session, null);

for (Statement s : statements) {

Expand All @@ -145,19 +140,19 @@ public void importFile(String fileName, RDFFormat format, String baseUri, int ba
// On the other hand, RDF resources are attributes and therefore never
// duplicated.
GraqlInsert query = Graql
.match(Graql.var("s").isa("rdf-non-literal").val(sbj.stringValue()),
Graql.var("p").isa("rdf-uri-reference").val(pred.stringValue()),
Graql.var("o").isa("rdf-node").val(Utils.toString(obj))) // This must be escaped
.insert(Graql.var("t").isa("rdf-triple").rel("rdf-subject", "s").rel("rdf-predicate", "p")
.rel("rdf-object", "o"));

List<ConceptMap> inserted = writeTransaction.execute(query, QUERY_OPTIONS).get();
if (inserted.size() != 1) {
.match(Graql.var("s").eq(sbj.stringValue()).isa("rdf-non-literal"),
Graql.var("p").eq(pred.stringValue()).isa("rdf-IRI"),
Graql.var("o").eq(Utils.toString(obj)).isa("rdf-node")) // This must be escaped
.insert(Graql.var("t").rel("rdf-subject", "s").rel("rdf-predicate", "p").rel("rdf-object", "o")
.isa("rdf-triple"));

Stream<ConceptMap> inserted = writeTransaction.query().insert(query, QUERY_OPTIONS);
if (inserted.count() != 1) {
// TODO proper logging and handling
System.out.println("\tS:\t" + Utils.toString(sbj) + "\t" + sbj.getClass().getName());
System.out.println("\tP:\t" + Utils.toString(pred) + "\t" + pred.getClass().getName());
System.out.println("\tO:\t" + Utils.toString(obj) + "\t" + obj.getClass().getName());
throw new RuntimeException(inserted.size() + " rdf-triple were inserted.");
throw new RuntimeException(inserted.count() + " rdf-triple were inserted.");
}

++tot;
Expand Down Expand Up @@ -198,18 +193,18 @@ public void close() {
*
* @param v The value to insert.
*/
private void insert(Value v, Transaction writeTransaction) {
private void insert(Value v, GraknTransaction writeTransaction) {
if (createdResources.contains(v.stringValue()))
return;

GraqlInsert query = null;
if (v.isBNode()) {
query = Graql.insert(Graql.var("x").isa("rdf-blank-node").val(v.stringValue()));
query = Graql.insert(Graql.var("x").eq(v.stringValue()).isa("rdf-blank-node"));
} else if (v.isIRI()) {
query = Graql.insert(Graql.var("x").isa("rdf-uri-reference").val(v.stringValue()));
query = Graql.insert(Graql.var("x").eq(v.stringValue()).isa("rdf-IRI"));
} else if (v.isLiteral()) {
Literal l = (Literal) v;
StatementInstance stmt = Graql.var("l").isa("rdf-literal").val(Utils.toString(l)).has("rdf-datatype",
Attribute stmt = Graql.var("l").eq(Utils.toString(l)).isa("rdf-literal").has("rdf-datatype",
l.getDatatype().stringValue());
if (!l.getLanguage().equals(Optional.empty())) {
stmt = stmt.has("rdf-language-tag", l.getLanguage().get());
Expand All @@ -220,7 +215,7 @@ private void insert(Value v, Transaction writeTransaction) {
throw new UnsupportedOperationException();
}

if (writeTransaction.execute(query, QUERY_OPTIONS).get().size() != 1)
if (writeTransaction.query().insert(query, QUERY_OPTIONS).count() != 1)
throw new RuntimeException(v.stringValue() + " not inserted.");

createdResources.add(v.stringValue());
Expand All @@ -237,14 +232,15 @@ private void insert(Value v, Transaction writeTransaction) {
// This means it is an instance of rdf:member.
// We need to specify this, as there is no other way to
// easily implement semantic of rdf:_nnn properties.
query = Graql.match(Graql.var("s").isa("rdf-uri-reference").val(v.stringValue()),
Graql.var("p").isa("rdf-uri-reference")
.val("http://www.w3.org/2000/01/rdf-schema#subPropertyOf"),
Graql.var("o").isa("rdf-uri-reference").val("http://www.w3.org/2000/01/rdf-schema#member"))
.insert(Graql.var("t").isa("rdf-triple").rel("rdf-subject", "s").rel("rdf-predicate", "p")
.rel("rdf-object", "o"));

if (writeTransaction.execute(query, QUERY_OPTIONS).get().size() != 1)
query = Graql
.match(Graql.var("s").eq(v.stringValue()).isa("rdf-IRI"),
Graql.var("p").eq("http://www.w3.org/2000/01/rdf-schema#subPropertyOf")
.isa("rdf-IRI"),
Graql.var("o").eq("http://www.w3.org/2000/01/rdf-schema#member").isa("rdf-IRI"))
.insert(Graql.var("t").rel("rdf-subject", "s").rel("rdf-predicate", "p")
.rel("rdf-object", "o").isa("rdf-triple"));

if (writeTransaction.query().insert(query, QUERY_OPTIONS).count() != 1)
throw new RuntimeException(v.stringValue() + " not properly marked as rdfs:memeber.");
} catch (NumberFormatException e) {
}
Expand All @@ -255,34 +251,35 @@ private void insert(Value v, Transaction writeTransaction) {
/**
* Commits current transaction (if any) and returns a new one to be used.
*
* @param session Open session (keyspace)
* @param session Open session (db)
* @param t Current transaction, if any, or null.
* @return
*/
private Transaction commitAndReopenTransaction(Session session, Transaction t) {
private GraknTransaction commitAndReopenTransaction(GraknSession session, GraknTransaction t) {
if (t != null) {
t.commit();
t.close();
}
return session.transaction().write();

return session.transaction(GraknTransaction.Type.WRITE);
}

public static void main(String[] args) {

try {
// Read parameters from command line
Options cliOpt = new Options();
cliOpt.addOption("k", true, "Key space to use for importing.");
cliOpt.addOption("k", true, "Database to use for importing.");
cliOpt.addOption("f", true, "Format of input file.");
cliOpt.addOption("u", true, "Base URI to use for nodes without a base.");
cliOpt.addOption("s", true,
"\"Batch\" size; perform this many insertions before committing a transaction. Higher values might speed up execution, as long as you have enough memory.");
CommandLine cli = (new DefaultParser()).parse(cliOpt, args);

String keySpace = cli.getOptionValue("k");
String db = cli.getOptionValue("k");
RDFFormat format = parseFormat(cli.getOptionValue("f"));
List<String> files = cli.getArgList();
if (keySpace == null || format == null || files.size() < 1) {
if (db == null || format == null || files.size() < 1) {
printUsage(cliOpt);
System.exit(-1);
}
Expand All @@ -299,7 +296,7 @@ public static void main(String[] args) {
}

// Import required files
try (RdfImporter in = new RdfImporter(keySpace)) {
try (RdfImporter in = new RdfImporter(db)) {
for (String file : files)
in.importFile(file, format, baseUri, batchSize);
}
Expand Down
50 changes: 50 additions & 0 deletions eclipse/kraal/src/main/java/logback.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<!-- turn debug=true on for logback-test.xml to help debug logging configurations. -->
<configuration debug="false">

<!--
We prefer logging to console instead of a File. Its very easy
to pipe console output to a file and most organizations already
have a log rotation setup in place. It can also be faster to use this
approach vs using a FileAppender directly
-->
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<!-- encoders are by default assigned the type
ch.qos.logback.classic.encoder.PatternLayoutEncoder -->
<encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>

<!--
Async appenders can drastically speed up logging as well as your application's
response time but with some potential drawbacks. Read more at.
https://logback.qos.ch/manual/appenders.html#AsyncAppender
http://blog.takipi.com/how-to-instantly-improve-your-java-logging-with-7-logback-tweaks/
Always be sure to test different configurations for yourself. Every
application has different requirements.
-->
<appender name="ASYNC" class="ch.qos.logback.classic.AsyncAppender">
<appender-ref ref="STDOUT" />
<queueSize>1000</queueSize>
</appender>

<!--
We prefer a default setting of WARN and turn on logging explicitly for
any packages we care about. INFO is also a good choice. Going lower than INFO
may log sensitive data such as passwords or api tokens via HTTP or networking
libraries. Remember these defaults impact third party libraries as well.
Often times the cost of logging is overlooked. Try a simple benchmark of
logging in a tight loop a few million iterations vs not logging and see the difference.
There are a few ways you can change logging levels on the fly in a running app.
This could be a better solution than over logging.
-->
<root level="ERROR">
<!--
If you want async logging just use ref="ASYNC" instead.
We will favor synchronous logging for simplicity. -->
<appender-ref ref="STDOUT" />
</root>

</configuration>

0 comments on commit 549279a

Please sign in to comment.