Skip to content

Commit

Permalink
[NOID] Fixes #4004: Given (a set of) queries return the schema + expl…
Browse files Browse the repository at this point in the history
…anation of the subgraph (#4065) (#4185)

* [NOID] Fixes #4004: Given (a set of) queries return the schema + explanation of the subgraph (#4065)

* Fixes #4004: Given (a set of) queries return the schema + explanation of the subgraph

* updated extended.txt

* [NOID] changes for 4.4

* [NOID] formatting and licence changes
  • Loading branch information
vga91 authored Nov 19, 2024
1 parent f6aaa00 commit 6db3315
Show file tree
Hide file tree
Showing 3 changed files with 164 additions and 6 deletions.
77 changes: 77 additions & 0 deletions docs/asciidoc/modules/ROOT/pages/ml/openai.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -331,3 +331,80 @@ RETURN DISTINCT a.name
| name | description
| value | the description of the dataset
|===

== Create explanation of the subgraph from a set of queries

This procedure `apoc.ml.fromQueries` returns an explanation, in natural language, of the given set of queries.

It uses the `chat/completions` API which is https://platform.openai.com/docs/api-reference/chat/create[documented here^].

.Query call
[source,cypher]
----
CALL apoc.ml.fromQueries(['MATCH (n:Movie) RETURN n', 'MATCH (n:Person) RETURN n'],
{apiKey: <apiKey>})
YIELD value
RETURN *
----

.Example response
[source, bash]
----
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| value |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "The database represents movies and people, like in a movie database or social network.
There are no defined relationships between nodes, allowing flexibility for future connections.
The Movie node includes properties like title, tagline, and release year." |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
----

.Query call with path
[source,cypher]
----
CALL apoc.ml.fromQueries(['MATCH (n:Movie) RETURN n', 'MATCH p=(n:Movie)--() RETURN p'],
{apiKey: <apiKey>})
YIELD value
RETURN *
----

.Example response
[source, bash]
----
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| value |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "models relationships in the movie industry, connecting :Person nodes to :Movie nodes.
It represents actors, directors, writers, producers, and reviewers connected to movies they are involved with.
Similar to a social network graph but specialized for the entertainment industry.
Each relationship type corresponds to common roles in movie production and reviewing.
Allows for querying and analyzing connections and collaborations within the movie business." |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
----


.Input Parameters
[%autowidth, opts=header]
|===
| name | description
| queries | The list of queries
| conf | An optional configuration map, please check the next section
|===

.Configuration map
[%autowidth, opts=header]
|===
| name | description | mandatory
| apiKey | OpenAI API key | in case `apoc.openai.key` is not defined
| model | The Open AI model | no, default `gpt-3.5-turbo`
| sample | The number of nodes to skip, e.g. a sample of 1000 will read every 1000th node. It's used as a parameter to `apoc.meta.data` procedure that computes the schema | no, default is a random number
|===

.Results
[%autowidth, opts=header]
|===
| name | description
| value | the description of the dataset
|===
2 changes: 2 additions & 0 deletions full/src/main/resources/extended.txt
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ apoc.metrics.get
apoc.metrics.list
apoc.metrics.storage
apoc.ml.cypher
apoc.ml.fromCypher
apoc.ml.fromQueries
apoc.ml.query
apoc.ml.schema
apoc.ml.openai.chat
Expand Down
91 changes: 85 additions & 6 deletions full/src/test/java/apoc/ml/PromptIT.java
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
package apoc.ml;

import static apoc.util.TestUtil.testCall;
import static apoc.util.TestUtil.testResult;
import static org.assertj.core.api.Assertions.assertThat;
import static org.junit.Assert.fail;

import apoc.coll.Coll;
import apoc.meta.Meta;
Expand All @@ -12,7 +15,6 @@
import java.util.Objects;
import java.util.stream.Collectors;
import org.apache.commons.lang3.StringUtils;
import org.assertj.core.api.Assertions;
import org.junit.After;
import org.junit.Assume;
import org.junit.Before;
Expand Down Expand Up @@ -58,8 +60,8 @@ public void testQuery() {
Map.of("query", "What movies did Tom Hanks play in?", "retries", 2L, "apiKey", OPENAI_KEY),
(r) -> {
List<Map<String, Object>> list = r.stream().collect(Collectors.toList());
Assertions.assertThat(list).hasSize(12);
Assertions.assertThat(list.stream()
assertThat(list).hasSize(12);
assertThat(list.stream()
.map(m -> m.get("query"))
.filter(Objects::nonNull)
.map(Object::toString)
Expand All @@ -72,7 +74,7 @@ public void testQuery() {
public void testSchema() {
testResult(db, "CALL apoc.ml.schema({apiKey: $apiKey})", Map.of("apiKey", OPENAI_KEY), (r) -> {
List<Map<String, Object>> list = r.stream().collect(Collectors.toList());
Assertions.assertThat(list).hasSize(1);
assertThat(list).hasSize(1);
});
}

Expand All @@ -88,13 +90,90 @@ public void testCypher() {
"apiKey", OPENAI_KEY),
(r) -> {
List<Map<String, Object>> list = r.stream().collect(Collectors.toList());
Assertions.assertThat(list).hasSize((int) numOfQueries);
Assertions.assertThat(list.stream()
assertThat(list).hasSize((int) numOfQueries);
assertThat(list.stream()
.map(m -> m.get("query"))
.filter(Objects::nonNull)
.map(Object::toString)
.filter(StringUtils::isNotEmpty))
.hasSize((int) numOfQueries);
});
}

@Test
public void testSchemaFromQueries() {
List<String> queries = List.of(
"MATCH p=(n:Movie)--() RETURN p",
"MATCH (n:Person) RETURN n",
"MATCH (n:Movie) RETURN n",
"MATCH p=(n)-[r]->() RETURN r");

testCall(
db,
"CALL apoc.ml.fromQueries($queries, {apiKey: $apiKey})",
Map.of(
"queries", queries,
"apiKey", OPENAI_KEY),
(r) -> {
String value = ((String) r.get("value")).toLowerCase();
assertThat(value).containsIgnoringCase("movie");
assertThat(value).satisfiesAnyOf(s -> assertThat(s).contains("person"), s -> assertThat(s)
.contains("people"));
});
}

@Test
public void testSchemaFromQueriesWithSingleQuery() {
List<String> queries = List.of("MATCH (n:Movie) RETURN n");

testCall(
db,
"CALL apoc.ml.fromQueries($queries, {apiKey: $apiKey})",
Map.of(
"queries", queries,
"apiKey", OPENAI_KEY),
(r) -> {
String value = ((String) r.get("value")).toLowerCase();
assertThat(value).containsIgnoringCase("movie");
assertThat(value).doesNotContainIgnoringCase("person", "people");
});
}

@Test
public void testSchemaFromQueriesWithWrongQuery() {
List<String> queries = List.of("MATCH (n:Movie) RETURN a");
try {
testCall(
db,
"CALL apoc.ml.fromQueries($queries, {apiKey: $apiKey})",
Map.of(
"queries", queries,
"apiKey", OPENAI_KEY),
(r) -> fail());
} catch (Exception e) {
assertThat(e.getMessage()).contains(" Variable `a` not defined");
}
}

@Test
public void testSchemaFromEmptyQueries() {
List<String> queries = List.of("MATCH (n:Movie) RETURN 1");

testCall(
db,
"CALL apoc.ml.fromQueries($queries, {apiKey: $apiKey})",
Map.of(
"queries", queries,
"apiKey", OPENAI_KEY),
(r) -> {
String value = ((String) r.get("value")).toLowerCase();

assertThat(value)
.satisfiesAnyOf(
s -> assertThat(s).contains("does not contain"),
s -> assertThat(s).contains("empty"),
s -> assertThat(s).contains("undefined"),
s -> assertThat(s).contains("doesn't have"));
});
}
}

0 comments on commit 6db3315

Please sign in to comment.