Skip to content

MikhailProg/json.awk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JSON AWK beautifier

To represent JSON tree structure the beautifier doesn't use gawk multidimensional array extension. Instead the implementation utilizes "classic" multidimensional array like array[expr1,expr2] which is a syntactic sugar and is equivalent to onedimensional array[expr1 SUBSEP expr2] where expr1 SUBSEP expr2 is string concatenation expression.

Even if now gawk is presented everywhere the challenge was to run the beautifier within any awk implementation. I even tried nawk from The Heirloom Toolchest and it works fine but sure not very fast.

The beautifier doesn't use recursion, the parser and the printer are iterative (there is a recursive version in recursive branch). The parser is lineoriented, it reads input line by line and preserves its state between parsing adjoint lines.

The beautifier is also a verifier since it follows RFC.

x.sh is online beautifier it checks JSON string, number, grammar rules and prints content immediately.

z.sh is also a beautifier but it stores the parsed object using flat scheme (see FLAT option) and prints it when the file is fully parsed.

Run beautifiers:

$ ./x.awk < ./z.json
...
$ ./z.awk < ./z.json
...

Test awks

mawk 1.3.4

gawk 4.1.4

nawk The Heirloom Toolchest

All awks produce the same output:

$ xawk -f ./z.awk <./large-file.json >/tmp/xawk.log

The following command is used to test awks:

$ time xawk -f ./z.awk <./large-file.json >/dev/null

CPU: Intel i7-7700

Result for x.sh:

gawk mawk nawk
Time 15s 7.5s 4m 47s

Result for z.sh:

gawk mawk nawk
Time 19s 15s 4m 54s
RSS 871Mb 367Mb 536Mb

nawk executes the program directly from AST so it's not surprising that it's so slow. gawk and mawk use VM to execute the program but mawk is faster especially in online version. The main gawk drawback is that it consumes a lot of memory.

Opts

By default each level of nesting is spaced with 4 whitespaces. The formating may be changed by setting INDENT environment variable (number of spaces):

$ INDENT=1 ./z.awk < ./z.json
{
 "firstName": "John",
 "lastName": "Smith",
 "isAlive": true,
 "age": 27,
 "address": {
  "streetAddress": "21 2nd Street",
  "city": "New York",
  ...
 "children": [
 ],
 "spouse": null
}

Set FLAT environment variable (for z.sh only) to see how parsed JSON object is stored in a flat structure:

$ FLAT=1 ./z.awk < ./z.json
["id"] = 22
[1] = obj
[1, "sz"] = 8
[1, 1] = "firstName"
...
[20, ""] = "123 456-7890"
[21] = arr
[22] = null

About

JSON AWK beautifier

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages