-
Notifications
You must be signed in to change notification settings - Fork 12
Parsers
Uri edited this page Mar 12, 2022
·
5 revisions
Ol inherited from his parent Owl excellent facilities for writing simple and efficient parsers. All parsers are lazy (so you should not load all stream before start parsing) and purely functional (really, despite the fact that file i/o operations are not pure functions).
Every parser is a (lambda (l r p ok) ...)
. Ol provides macro let-parse*
that helps orginize parser structure like regular let*
syntax instead of writing lot of lambdas.
For example, let's check this gzip file parser.
(define gzip-parser
(let-parse* (
(ID1 byte)
(ID2 byte)
(verify (and (eq? ID1 #x1F) (eq? ID2 #x8B)) `not-a-gzip)
(CM byte)
(verify (eq? CM 8) `not-a-deflate)
(FLG byte)
(MTIME (times 4 byte))
(XFL byte)
(OS byte)
(FNAME (if (zero? (band FLG #b1000))
(epsilon #f)
(let-parse* (
(filename (greedy* (byte-if (lambda (x) (less? 0 x)))))
(zt (imm 0)))
(bytes->string filename))))
; header done
; return stream
(data (lambda (l r p ok)
(ok '() r p r)))
)
{
'FLG FLG
'FNAME FNAME
'OS (case OS
(0 "FAT filesystem (MS-DOS, OS/2, NT/Win32)")
;; ...
(255 "unknown"))
'stream (inflate data)
}))
;; usage:
(import (owl parse))
(define labels-file (try-parse gzip-parser (file->bytestream "mnist/train-labels-idx1-ubyte.gz") #f))
(print labels-file)
;; output:
'(#ff((FLG . 8) (OS . #false) (FNAME . "train-labels.idx1-ubyte") (stream 0 . #function)) 125 157 9 162 36 185 ....... 234 0 0 . #function))
This is basic parser library. Every parser in (owl parse)
library has alternative name with prefix get-
(either
and get-either
, byte
and get-byte
, etc.).
The list of parsers:
-
(verify x val)
, do nothing if x, otherwise produce parser error withval
. -
(epsilon x)
, akaε
- just returnsx
, no actual stream parsing does -
(either a b)
- usea
parser, ifa
failed useb
, else fail. -
(maybe a x)
- usea
parser, ifa
failed returnsx
. -
(any-of a...)
- likeeither
but with any number of parsers inside. -
(greedy* a)
- returns a list with zero or more consecutive parseda
elements. -
(greedy+ a)
- returns a list with one or more consecutive parseda
elements. -
(byte)
- parse one byte into number (enum). -
(byte-if pred)
- parse one byte into number (enum), if (pred number) is true. -
(byte-between below above)
- one byte into number (enum) ifbelow < number < above
(in senseless?
). -
(imm x)
- parse one byte into number (enum) if(eq? number x)
. -
(word str v)
- parse string str and return v if parsed string is astr
(in sense equal? for bytevectors). -
(rune)
- parse one unicode character into number (enum). -
(rune-if pred)
- parse one unicode character into number (enum), if (pred number) is true.
The list of parse functions:
-
(try-parse parser data show-error)
- tries to parsedata
stream withparser
, returns'(parsed-data . stream-right-part)
or #false if error. prints error ifshow-error
. -
(parse parser data path errmsg fail-val)
- tries to parse alldata
stream withparser
, returns parsed-data if parsed ok and stream is ended. printspath
anderrmsg
in other case and returnsfail-val
.
-
(sexp-parser)
- parser for s-expressions. can parse numbers, symbols, regexps, lists, strings, vectors, ffs, special words, etc. S-expression is a lisp element, so can be tested withinteger?
,string?
,bytevector?
, etc. functions.
Every parser is a (lambda (l r p ok) ...)
.
- l - left part of stream
- r - right part of stream
- p - position in the stream (just to indicate error in stream if exist)
- ok - (lambda (l r p result) ...)
- l - left part of stream
- r - right part of stream
- p - position in the stream
- result - parsed part of stream