Skip to content

Grammar

Matt edited this page Nov 30, 2021 · 2 revisions

Let's say you have a data model that tracks donations to local political candidates. Being the adept data modeler you are, you create a model that models donations as nodes, with relationships between the donor and donation, and between the recipient and donation.

(:Person)-[:GAVE_DONATION]->(:Donation)<-[:RECEIVED_DONATION]-(:Person)

To write this in vanilla Aspen, you'd write:

default:
  label: Person
  attributes:
    Donation: amount
----
(Chuck) [gave donation] (Donation, 20.00).
(Alexandria) [received donation] (Donation, 20.00).

You may already be seeing some issues with this, like:

  • How does Aspen know whether those two donations are the same donation, or different donations of the same amount?
  • What currency is that? We can assume it's USD from context, but it's not self-evident.
  • It takes two lines to express a single concept, which is counter to Aspen's goal of producing data efficiently.

To solve all of these issues, we can define custom grammars.

Custom grammars define sentences and assign them to Cypher statements. Once we define a custom grammar, we can write a simple sentence—with no parentheses or brackets!—and it will populate a Cypher statement.

NOTE: At the moment, custom grammars ONLY render when transpiling to Cypher. If you are transpiling to other formats, custom statements will go missing. See the related GitHub issue.

NOTE: Protections don't cover Custom Grammar—only "vanilla" Aspen. We figure, if you're writing a custom grammar, you're being careful about the Cypher template you're writing.

Let's say we want to be able to write "Chuck donated $20 to Alexandria.", and have it map to a Cypher statement.

# Discourse
grammar:
  -
    match:
      - Chuck donated $20 to Alexandria.
    template: |
      (:Person { name: "Chuck" })-[:GAVE_DONATION]->(:Donation { amount: 20 })<-[:RECEIVED_DONATION]-(:Person { name: "Alexandria" })
----
# Narrative
Chuck donated $20 to Alexandria.

Adding matchers for nodes

The above grammar gives us the exact statement we want, but next we want to generalize it so we can write, "Sarah donated $30 to Sumbul". We need to set variables in both the match and to sections.

First, let's replace the donor and recipient. These will both be Persons, so we'll write:

grammar:
  -
    match:
      - (Person a) donated $20 to (Person b).
    template: |
      {{{a}}}-[:GAVE_DONATION]->(:Donation { amount: 20 })<-[:RECEIVED_DONATION]-{{{b}}}
----
Chuck donated $20 to Alexandria.
Sarah donated $20 to Alexandria.

We've changed two things here. (We still have to do the dollar amount. This code only handles $20 donations. We'll cover that in the following section.)

First, we replaced the literal names of "Chuck" and "Alexandria" with matchers that will take the text of a sentence and assign it to variables a and b.

Second, we use the variables a and b in the template. To do this, we removed the Cypher and the parentheses, and surrounded each variable with triple curly braces {{{}}}, to indicate that they are

The templates—in the to section—use a templating language called Mustache. If you've ever used Mustache, you've probably used double braces like {{variable}}. We need triple braces in Aspen because Mustache escapes characters to be HTML-safe, which is a problem because Cypher needs those characters. If you accidentally use double-braces, you'll see nodes like:

(:Person, { name: &quot;Chuck&quot; })

Not ideal! So, triple braces it is.

When we feed this custom grammar with the sentence "Chuck donated $20 to Alexandria.", the data behind the scenes looks sort of like this:

{
  "a" => (:Person, { name: "Chuck" }),
  "b" => (:Person, { name: "Alexandria" }),
}

When we take this template

  # ...
  template:
    {{{a}}}-[:GAVE_DONATION]->(:Donation { amount: 20 })<-[:RECEIVED_DONATION]-{{{b}}}

and populate it with the above data, we get the Cypher we're aiming for:

/* Simplified slightly for demonstration purposes */

(:Person { name: "Chuck" })-[:GAVE_DONATION]->(:Donation { amount: 20 })<-[:RECEIVED_DONATION]-(:Person { name: "Alexandria" })

Adding unique slugs

We have an apparent problem. If multiple people give $20 donations, with this Cypher, only one $20 donation node will be created.

MERGE (:Person { name: "Chuck" })-[:GAVE_DONATION]->(:Donation { amount: 20 })<-[:RECEIVED_DONATION]-(:Person { name: "Alexandria" })
MERGE (:Person { name: "Joe" })-[:GAVE_DONATION]->(:Donation { amount: 20 })<-[:RECEIVED_DONATION]-(:Person { name: "Alexandria" })

In order to ensure that each donation is created as a unique node, we set up evaluation-unique nicknames using a special tag, {{{your_name_here:uniq}}}.

So, let's add a nickname to the donation with this syntax, to let Cypher know that each donation is a distinct, unique object.

  # ...
  template:
    {{{a}}}-[:GAVE_DONATION]->({{{donation:uniq}}}:Donation { amount: 20 })<-[:RECEIVED_DONATION]-{{{b}}}

This will produce Cypher like:

MERGE (:Person { name: "Chuck" })-[:GAVE_DONATION]->(donation_1:Donation { amount: 20 })<-[:RECEIVED_DONATION]-(:Person { name: "Alexandria" })
MERGE (:Person { name: "Joe" })-[:GAVE_DONATION]->(donation_2:Donation { amount: 20 })<-[:RECEIVED_DONATION]-(:Person { name: "Alexandria" })

Adding matchers for other information

We still have to set the amount of the donation as a variable. If we left it as is, every donation would be $20!

grammar:
  match:
    - (Person a) donated $(numeric dollar_amount) to (Person b).
  template:
    {{{a}}}-[:GAVE_DONATION]->(:Donation { amount: {{{dollar_amount}}} })<-[:RECEIVED_DONATION]-{{{b}}}

Okay, so we've added the matcher (numeric dollar_amount), and used the variable in the template.

Types of matchers

Aspen accepts three different types of matchers: numeric, string, and nodes. We've already seen nodes.

Node matchers we've already used, and they come in the form of (Label variable_name). If you type any word starting with an uppercase letter, that's the label that will be applied.

Numeric matchers will match typical (US) formats of numbers, including:

  • 1 (integer)
  • 0.000001 (float)
  • 100,000,000.00 (float)

For convenience, any numeric type (even if it has commas!) will be converted to numbers. Whole numbers will be converted to integers, and anything with a decimal point will be converted to floats.

You can specify integer (whole numbers) or float (decimals) instead of numeric, but it will force you to use a whol enumber or a decimal in the narrative, respectively. The numeric type will accept all numbers.

String matchers will match anything in double-quotes. (Please don't use single quotes, as Aspen doesn't support them yet. Help fix this issue.)

At the moment, if you have a string matcher like:

Chuck works as a (string job_position) at Kabletown.

then make sure to write the value for job_position in quotes in the narrative, like

Chuck works as a "research assistant" at Kabletown.

If you don't, it won't match!

The quotes read as sarcastic, so we want to change this soon! (Help fix this issue.)

Finishing our custom grammar

Let's see the whole file and add some more lines that this grammar can match, as well as some vanilla Aspen.

Statements that match custom grammars don't need brackets and parentheses, but vanilla Aspen—Aspen that won't match custom grammars—always do.

default:
  label: Person
reciprocal: knows

grammar:
  -
    match:
      - (Person a) donated $(numeric dollar_amount) to (Person b).
      - (Person a) gave (Person b) $(numeric dollar_amount).
      - (Person a) gave a $(numeric dollar_amount) donation to (Person b).
    template:
      {{{a}}}-[:GAVE_DONATION]->(:Donation { amount: {{{dollar_amount}}} })<-[:RECEIVED_DONATION]-{{{b}}}

Documentation

Quickstart Guide

Clone this wiki locally