Skip to content

thenumbernine/lua-parser

Repository files navigation

Donate via Stripe

Lua Parser in Lua

Parses to an abstract syntax tree representation. Call tostring() on the AST to get equivalent Lua code.

Works for versions 5.1 5.2 5.3 5.4 and Luajit. I broke <=5.2 compatability when I resorted to throwing objects for parse error reporting.

AST also contains some functions like flatten() for use with optimizing / auto-inlining Lua.

See the tests folder for example usage.

Reference

Parser = require 'parser' This will return the parser class.

result, msg = Parser.parse(data[, source, version, useluajit]) This parses the code in data and returns an ast._block object. This is shorthand for Parser(data, source, version, useluajit).tree version is a string '5.3', '5.4', etc., corresponding to your Lua version. The Parser object has a few more functions to it corresponding with internal use while parsing. source is a description of the source, i.e. filename, which is included in some nodes (functions) for information on where they are declared. Returns result in case of success. If it encounters a parse error returns false and msg as what went wrong.

ast = require 'parser.lua.ast' This is the AST (abstract syntax tree) library, it hold a collection of AST classes, each representing a different token in the Lua syntax.

n = ast.node() = This is the superclass of all AST classes.

Each has the following properties:

n.type = returns the type of the node, coinciding with the classname in the ast library with underscore removed.

n.span = source code span information (from and to subtables each with source, line and col fields)

n:copy() = returns a copy of the node.

n:flatten(func, varmap) = flattens / inlines the contents of all function call of this function. Used for performance optimizations.

n:toLua() = generate Lua code. same as the node's __tostring.

n:serialize(apply) = apply a to-string serialization function to the AST.

ast.node subclasses:

n = ast._block(...) = a block of code in Lua.
... is a list of initial child stmt nodes to populate the block node with.
n.type == 'block'.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._stmt() = a statement-node parent-class.

n = ast._assign(vars, exprs) =
An assignment operation.
Subclass of _stmt.
n.type == 'assign'.
Represents the assignment of n.vars to n.exprs.

n = ast._do(...) =
A do ... end block.
Subclass of _stmt.
n.type == 'do'.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._while(cond, ...) =
A while cond do ... end block.
Subclass of _stmt.
n.type == 'while'.
n.cond holds the condition expression.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._repeat(cond, ...) =
A repeat ... until cond block.
Subclass of _stmt.
n.type == 'repeat'.
n.cond holds the condition expression.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._if(cond, ...) =
A if cond then ... elseif ... else ... end block.
Subclass of _stmt.
n.type == 'if'.
n.cond holds the condtion expression of the first if statement.
All subsequent arguments must be ast._elseif objects, optionally with a final ast._else object.
n.elseifs holds the ast._elseif objects.
n.elsestmt optionally holds the final ast._else.

n = ast._elseif(cond, ...) =
A elseif cond then ... block.
Subclass of _stmt.
n.type == 'elseif'.
n.cond holds the condition expression of the else statement.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._else(...) =
A else ... block.
n.type == 'else'.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._foreq(var, min, max, step, ...) =
A for var=min,max[,step] do ... end block.
Subclass of _stmt.
n.type == 'foreq'.
n.var = the variable node.
n.min = the min expression.
n.max = the max expression.
n.step = the optional step expression.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._forin(vars, iterexprs, ...)
A for var1,...varN in expr1,...exprN do ... end block.
Subclass of _stmt.
n.type == 'forin'.
n.vars = table of variables of the for-in loop.
n.iterexprs = table of iterator expressions of the for-in loop.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._function(name, args, ...)
A function [name](arg1, ...argN) ... end block.
Subclass of _stmt.
n.type == 'function'.
n.name = the function name. This is optional. Omit name for this to represent lambda function. (Which technically becomes an expression and not a statement...)
n.args = table of arguments. This does get modified: each argument gets assigned an .param = true, and an .index = for which index it is in the argument list.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._local(exprs)
A local ... statement.
Subclass of _stmt.
n.type == 'local'
n.exprs = list of expressions to be declared as locals.
Expects its member-expressions to be either functions or assigns.

n = ast._return(...)
A return ... statement.
Subclass of _stmt.
n.type == 'return'
n.exprs = list of expressions to return.

n = ast._break(...)
A break statement.
Subclass of _stmt.
n.type == 'break'

n = ast._call(func, ...)
A func(...) function-call expression.
n.type == 'call'
n.func = expression of the function to call.
n.args = list argument expressions to pass into the function-call.

n = ast._nil()
A nil literal expression.
n.type == 'nil'.
n.const == true.
n = ast._boolean()
The parent class of the true/false AST nodes.
n = ast._true()
A true boolean literal expression
n.type == 'true'.
n.const == true.
n.value == true.
ast._boolean:isa(n) evaluates to true

n = ast._false()
A false boolean literal expression
n.type == 'true'.
n.const == true.
n.value == false.
ast._boolean:isa(n) evaluates to true

n = ast._number(value)
A numeric literal expression.
n.type == 'number'.
n.value = the numerical value.

n = ast._string(value)
A string literal expression.
n.type == 'string'.
n.value = the string value.

n = ast._vararg()
A vararg ... expression.
n.type == 'vararg'.
For use within function arguments, assignment expressions, function calls, etc.

n = ast._table(...)
A table { ... } expression.
n.type == 'table'.
n[1] ... n[#n] = expressions of the table.
If the expression in n[i] is an ast._assign then an entry is added into the table as key = value. If it is not an ast._assign then it is inserted as a sequenced entry.

n = ast._var(name)
A variable reference expression.
n.type == 'var'
n.name = the variable name.

n = ast._par(expr)
A ( ... ) parenthesis expression.
n.type == 'par'.
n.expr = the expression within the parenthesis.

n = ast._index(expr, key)
An expr[key] expression, i.e. an __index-metatable operation.
n.type == 'index'.
n.expr = the expression to be indexed.
n.key = the expression of the index key.

n = ast._indexself(expr, key)
An expr:key expression, to be used as the expression of a ast._ call node for member-function-calls. These are Lua's shorthand insertion of self as the first argument.
n.type == 'indexself'.
n.expr = the expression to be indexed.
n.key = the key to index. Must only be a Lua string, (not an ast._ string, but a real Lua string).

Binary operations:

node type Lua operator
_add +
_sub -
_mul *
_div /
_mod %
_concat ..
_lt <
_le <=
_gt >
_ge >=
_eq ==
_ne ~=
_and and
_or or
_idiv // 5.3+
_band & 5.3+
_bxor ~ 5.3+
_bor | 5.3+
_shl << 5.3+
_shr >> 5.3+

n[1] ... n[#n] = a table of the arguments of the operation.

Unary operations:

node type Lua operator
_unm -
_not not
_len #
_bnot ~ 5.3+

n[1] = the single argument of the operation.

more extra functions:

Some more useful functions in AST:

  • ast.copy(node) = equivalent of node:copy()
  • ast.flatten(node, func, varmap) = equivalent of node:flatten(func, varmap)
  • ast.refreshparents
  • ast.traverse
  • ast.nodeclass(type, parent, args)
  • ast.tostringmethod = this specifies the serialization method. It is used to look up the serializer stored in ast.tostringmethods

TODO:

  • Option for parsing LuaJIT -i number suffixes.
  • Speaking of LuaJIT, it has different edge case syntax for 2.0.5, 2.1.0, and whether 5.2-compat is enabled or not. It isn't passing the minify_tests.lua.
  • How about flags to turn off and on each feature, then a function for auto-detect flag sets based on Lua VERSION string or by running some local load() tests
  • Make all node allocation routed through Parser:node to give the node a .parser field to point back to the parser - necessary for certain AST nodes that need to tell what parser keywords are allowed. I do this where necessary but I should do it always.
    • I've also made this keyword test optional since in some rare projects (vec-lua for one) I am inserting AST nodes for the sake of a portable AST that I can inject as inline'd code, but without a parser, so I don't have a proper enumeration of keywords. So for now I'm making ast node .parser optional and the keyword test bypassed if .parser isn't present. I'll probably make it a hard constraint later when I rework vec-lua.
    • It seems like a quick fix to just convert all a.bs into a['b']s ... but Lua for some reason doesn't support a['b']:c() as an equivalent of a.b:c() ... so converting everything from dot to brack index could break some regenerated Lua scripts.
  • To preserve spacing and comments (useful for my langfix transpiler), instead of using ast fields which are tokens, I should use token-references as fields and allow them to be replaced ... maybe ...
  • I'm very tempted to switch the AST index names to remove the preceding underscore. Pro of keeping it: the keywords become valid Lua names. Pro of removing it: the AST index matches the keyword that the AST node represents ...

Dependencies:

While I was at it, I added a require() replacement for parsing Lua scripts and registering callbacks, so any other script can say "require 'parser.load_xform':insert(function(tree) ... modify the parse tree ... end)" and voila, Lua preprocessor in Lua!

minify_tests.txt taken from the tests at https://github.com/stravant/LuaMinify

I tested this by parsing itself, then using the parsed & reconstructed version to parse itself, then using the parsed & reconstructed version to parse the parsed & reconstructed version, then using the 2x parsed & reconstructed version to parse itself

About

Lua parser and abstract syntax tree in Lua

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages