Parses to an abstract syntax tree representation. Call tostring() on the AST to get equivalent Lua code.
Works for versions 5.1 5.2 5.3 5.4 and Luajit. I broke <=5.2 compatability when I resorted to throwing objects for parse error reporting.
AST also contains some functions like flatten() for use with optimizing / auto-inlining Lua.
See the tests folder for example usage.
Parser = require 'parser'
This will return the parser class.
result, msg = Parser.parse(data[, source, version, useluajit])
This parses the code in data
and returns an ast._block
object.
This is shorthand for Parser(data, source, version, useluajit).tree
version
is a string '5.3', '5.4'
, etc., corresponding to your Lua version.
The Parser
object has a few more functions to it corresponding with internal use while parsing.
source
is a description of the source, i.e. filename, which is included in some nodes (functions) for information on where they are declared.
Returns result
in case of success. If it encounters a parse error returns false
and msg
as what went wrong.
ast = require 'parser.lua.ast'
This is the AST (abstract syntax tree) library,
it hold a collection of AST classes, each representing a different token in the Lua syntax.
n = ast.node()
= This is the superclass of all AST classes.
Each has the following properties:
n.type
= returns the type of the node, coinciding with the classname in the ast
library with underscore removed.
n.span
= source code span information (from
and to
subtables each with source
, line
and col
fields)
n:copy()
= returns a copy of the node.
n:flatten(func, varmap)
= flattens / inlines the contents of all function call of this function. Used for performance optimizations.
n:toLua()
= generate Lua code. same as the node's __tostring
.
n:serialize(apply)
= apply a to-string serialization function to the AST.
n = ast._block(...)
= a block of code in Lua.
...
is a list of initial child stmt
nodes to populate the block
node with.
n.type == 'block'
.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._stmt()
= a statement-node parent-class.
n = ast._assign(vars, exprs)
=
An assignment operation.
Subclass of _stmt
.
n.type == 'assign'
.
Represents the assignment of n.vars
to n.exprs
.
n = ast._do(...)
=
A do ... end
block.
Subclass of _stmt
.
n.type == 'do'
.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._while(cond, ...)
=
A while cond do ... end
block.
Subclass of _stmt
.
n.type == 'while'
.
n.cond
holds the condition expression.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._repeat(cond, ...)
=
A repeat ... until cond
block.
Subclass of _stmt
.
n.type == 'repeat'
.
n.cond
holds the condition expression.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._if(cond, ...)
=
A if cond then ... elseif ... else ... end
block.
Subclass of _stmt
.
n.type == 'if'
.
n.cond
holds the condtion expression of the first if
statement.
All subsequent arguments must be ast._elseif
objects, optionally with a final ast._else
object.
n.elseifs
holds the ast._elseif
objects.
n.elsestmt
optionally holds the final ast._else
.
n = ast._elseif(cond, ...)
=
A elseif cond then ...
block.
Subclass of _stmt
.
n.type == 'elseif'
.
n.cond
holds the condition expression of the else
statement.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._else(...)
=
A else ...
block.
n.type == 'else'
.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._foreq(var, min, max, step, ...)
=
A for var=min,max[,step] do ... end
block.
Subclass of _stmt
.
n.type == 'foreq'
.
n.var =
the variable node.
n.min =
the min expression.
n.max =
the max expression.
n.step =
the optional step expression.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._forin(vars, iterexprs, ...)
A for var1,...varN in expr1,...exprN do ... end
block.
Subclass of _stmt
.
n.type == 'forin'
.
n.vars =
table of variables of the for-in loop.
n.iterexprs =
table of iterator expressions of the for-in loop.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._function(name, args, ...)
A function [name](arg1, ...argN) ... end
block.
Subclass of _stmt
.
n.type == 'function'
.
n.name =
the function name. This is optional. Omit name for this to represent lambda function. (Which technically becomes an expression and not a statement...)
n.args =
table of arguments. This does get modified: each argument gets assigned an .param = true
, and an .index =
for which index it is in the argument list.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._local(exprs)
A local ...
statement.
Subclass of _stmt
.
n.type == 'local'
n.exprs =
list of expressions to be declared as locals.
Expects its member-expressions to be either functions or assigns.
n = ast._return(...)
A return ...
statement.
Subclass of _stmt
.
n.type == 'return'
n.exprs =
list of expressions to return.
n = ast._break(...)
A break
statement.
Subclass of _stmt
.
n.type == 'break'
n = ast._call(func, ...)
A func(...)
function-call expression.
n.type == 'call'
n.func =
expression of the function to call.
n.args =
list argument expressions to pass into the function-call.
n = ast._nil()
A nil
literal expression.
n.type == 'nil'
.
n.const == true
.
n = ast._boolean()
The parent class of the true
/false
AST nodes.
n = ast._true()
A true
boolean literal expression
n.type == 'true'
.
n.const == true
.
n.value == true
.
ast._boolean:isa(n)
evaluates to true
n = ast._false()
A false
boolean literal expression
n.type == 'true'
.
n.const == true
.
n.value == false
.
ast._boolean:isa(n)
evaluates to true
n = ast._number(value)
A numeric literal expression.
n.type == 'number'
.
n.value =
the numerical value.
n = ast._string(value)
A string literal expression.
n.type == 'string'
.
n.value =
the string value.
n = ast._vararg()
A vararg ...
expression.
n.type == 'vararg'
.
For use within function arguments, assignment expressions, function calls, etc.
n = ast._table(...)
A table { ... }
expression.
n.type == 'table'
.
n[1] ... n[#n] =
expressions of the table.
If the expression in n[i]
is an ast._assign
then an entry is added into the table as key = value
. If it is not an ast._assign
then it is inserted as a sequenced entry.
n = ast._var(name)
A variable reference expression.
n.type == 'var'
n.name =
the variable name.
n = ast._par(expr)
A ( ... )
parenthesis expression.
n.type == 'par'
.
n.expr =
the expression within the parenthesis.
n = ast._index(expr, key)
An expr[key]
expression, i.e. an __index
-metatable operation.
n.type == 'index'
.
n.expr =
the expression to be indexed.
n.key =
the expression of the index key.
n = ast._indexself(expr, key)
An expr:key
expression, to be used as the expression of a ast._ call
node for member-function-calls. These are Lua's shorthand insertion of self
as the first argument.
n.type == 'indexself'
.
n.expr =
the expression to be indexed.
n.key =
the key to index. Must only be a Lua string, (not an ast._ string
, but a real Lua string).
Binary operations:
node type | Lua operator | |
---|---|---|
_add |
+ |
|
_sub |
- |
|
_mul |
* |
|
_div |
/ |
|
_mod |
% |
|
_concat |
.. |
|
_lt |
< |
|
_le |
<= |
|
_gt |
> |
|
_ge |
>= |
|
_eq |
== |
|
_ne |
~= |
|
_and |
and |
|
_or |
or |
|
_idiv |
// |
5.3+ |
_band |
& |
5.3+ |
_bxor |
~ |
5.3+ |
_bor |
| |
5.3+ |
_shl |
<< |
5.3+ |
_shr |
>> |
5.3+ |
n[1] ... n[#n] =
a table of the arguments of the operation.
Unary operations:
node type | Lua operator | |
---|---|---|
_unm |
- |
|
_not |
not |
|
_len |
# |
|
_bnot |
~ |
5.3+ |
n[1] =
the single argument of the operation.
Some more useful functions in AST:
ast.copy(node)
= equivalent ofnode:copy()
ast.flatten(node, func, varmap)
= equivalent ofnode:flatten(func, varmap)
ast.refreshparents
ast.traverse
ast.nodeclass(type, parent, args)
ast.tostringmethod
= this specifies the serialization method. It is used to look up the serializer stored inast.tostringmethods
- Option for parsing LuaJIT -i number suffixes.
- Speaking of LuaJIT, it has different edge case syntax for 2.0.5, 2.1.0, and whether 5.2-compat is enabled or not. It isn't passing the
minify_tests.lua
. - How about flags to turn off and on each feature, then a function for auto-detect flag sets based on Lua VERSION string or by running some local
load()
tests - Make all node allocation routed through
Parser:node
to give the node a .parser field to point back to the parser - necessary for certain AST nodes that need to tell what parser keywords are allowed. I do this where necessary but I should do it always.- I've also made this keyword test optional since in some rare projects (
vec-lua
for one) I am inserting AST nodes for the sake of a portable AST that I can inject as inline'd code, but without a parser, so I don't have a proper enumeration of keywords. So for now I'm making ast node.parser
optional and the keyword test bypassed if.parser
isn't present. I'll probably make it a hard constraint later when I reworkvec-lua
. - It seems like a quick fix to just convert all
a.b
s intoa['b']
s ... but Lua for some reason doesn't supporta['b']:c()
as an equivalent ofa.b:c()
... so converting everything from dot to brack index could break some regenerated Lua scripts.
- I've also made this keyword test optional since in some rare projects (
- To preserve spacing and comments (useful for my
langfix
transpiler), instead of using ast fields which are tokens, I should use token-references as fields and allow them to be replaced ... maybe ... - I'm very tempted to switch the AST index names to remove the preceding underscore. Pro of keeping it: the keywords become valid Lua names. Pro of removing it: the AST index matches the keyword that the AST node represents ...
While I was at it, I added a require() replacement for parsing Lua scripts and registering callbacks,
so any other script can say "require 'parser.load_xform':insert(function(tree) ... modify the parse tree ... end)"
and voila, Lua preprocessor in Lua!
minify_tests.txt
taken from the tests at https://github.com/stravant/LuaMinify
I tested this by parsing itself, then using the parsed & reconstructed version to parse itself, then using the parsed & reconstructed version to parse the parsed & reconstructed version, then using the 2x parsed & reconstructed version to parse itself