A basic left-corner parser for minimalist grammars, implemented in Python. The parsing algorithm is based on the following paper.
The main function is parse
, given in the module lc_parser.py
.
It takes an input sentence (list of words) and returns a list of all possible parses.
Each parse is a final, successful configuration the parser has reached, alongside the history of applied rules.
Here is a basic template for using the parser:
g1 = MG('input/g1.json')
parser = LCParser(g1)
results = parser.parse(['Aca', 'knows', 'what', 'Bibi', 'likes'])
- Load the grammar from a JSON file.
- Create a parser object with the loaded grammar.
- Parse a sentence:
- Create a stack with the initial config.
- while (stack != empty):
- Pop a configuration from the stack.
- Apply all possible rules to the configuration.
- If the rule produced a result, create a new configuration and push it to the stack.
- If the configuration is successful, add it to the results list.
Note that the default parsing behaviour is loading the rules from the grammar.
Regarding empty-shift rules (where the shifted lexical item is not consumed from the remaining input):
for each empty lexical item in the grammar ('': ["=v,c", "=v,+wh,c"]
), the appropriate empty-shift rules
are created and added to the parsing rules list.
Another working assumption is that shift rules are never executed consecutively.
That is, if a shift rule is applied, the next rule cannot be another shift rule, and
is therefore skipped with the current log:
Skipping rule: {rule} because it follows a shift rule!
The parse
function supports two more modes of running:
rules
: A list of rules to be used in the parsing process. This replaces the default list of rules loaded from the grammar.manual
: If set to true, alongside a list of rules, the parser will apply them in the order given (for directly testing the correct parsing process).
In either case, when a rule's condition is not met, or we tried to apply it and got nothing new (it's result will be None
),
we can except a log message ending in returning same config
.
- Current use of log levels are to show the parsing process in detail and display with color (that's why rule application are logged as "warnings")
- Based on the original implementation in prolog by Miloš Stanojević.