Skip to content

Parsers

Uri edited this page Mar 12, 2022 · 5 revisions

Parsers

Ol inherited from his parent Owl excellent facilities for writing simple and efficient parsers. All parsers are lazy (so you should not load all stream before start parsing) and purely functional (really, despite the fact that file i/o operations are not pure functions).

Every parser is a (lambda (l r p ok) ...). Ol provides macro let-parse* that helps orginize parser structure like regular let* syntax instead of writing lot of lambdas.

For example, let's check this gzip file parser.

   (define gzip-parser
      (let-parse* (
            (ID1 byte)
            (ID2 byte)
            (verify (and (eq? ID1 #x1F) (eq? ID2 #x8B)) `not-a-gzip)

            (CM byte)
            (verify (eq? CM 8) `not-a-deflate)
            (FLG byte)
            (MTIME (times 4 byte))
            (XFL byte)
            (OS byte)

            (FNAME (if (zero? (band FLG #b1000))
               (epsilon #f)
               (let-parse* (
                     (filename (greedy* (byte-if (lambda (x) (less? 0 x)))))
                     (zt (imm 0)))
                  (bytes->string filename))))

            ; header done
            ; return stream
            (data (lambda (l r p ok)
                     (ok '() r p r)))
            )
         {
            'FLG FLG
            'FNAME FNAME
            'OS (case OS
                  (0 "FAT filesystem (MS-DOS, OS/2, NT/Win32)")
                  ;; ...
                  (255 "unknown"))
            'stream (inflate data)
         }))

;; usage:
(import (owl parse))
(define labels-file (try-parse gzip-parser (file->bytestream "mnist/train-labels-idx1-ubyte.gz") #f))
(print labels-file)

;; output:
'(#ff((FLG . 8) (OS . #false) (FNAME . "train-labels.idx1-ubyte") (stream 0 . #function)) 125 157 9 162 36 185 ....... 234 0 0 . #function))

Libraries

(owl parse)

This is basic parser library. Every parser in (owl parse) library has alternative name with prefix get- (either and get-either, byte and get-byte, etc.).

The list of parsers:

  • (verify x val), do nothing if x, otherwise produce parser error with val.
  • (epsilon x), aka ε - just returns x, no actual stream parsing does
  • (either a b) - use a parser, if a failed use b, else fail.
  • (maybe a x) - use a parser, if a failed returns x.
  • (any-of a...) - like either but with any number of parsers inside.
  • (greedy* a) - returns a list with zero or more consecutive parsed a elements.
  • (greedy+ a) - returns a list with one or more consecutive parsed a elements.
  • (byte) - parse one byte into number (enum).
  • (byte-if pred) - parse one byte into number (enum), if (pred number) is true.
  • (byte-between below above) - one byte into number (enum) if below < number < above (in sense less?).
  • (imm x) - parse one byte into number (enum) if (eq? number x).
  • (word str v) - parse string str and return v if parsed string is a str (in sense equal? for bytevectors).
  • (rune) - parse one unicode character into number (enum).
  • (rune-if pred) - parse one unicode character into number (enum), if (pred number) is true.

The list of parse functions:

  • (try-parse parser data show-error) - tries to parse data stream with parser, returns '(parsed-data . stream-right-part) or #false if error. prints error if show-error.
  • (parse parser data path errmsg fail-val) - tries to parse all data stream with parser, returns parsed-data if parsed ok and stream is ended. prints path and errmsg in other case and returns fail-val.

(lang sexp)

  • (sexp-parser) - parser for s-expressions. can parse numbers, symbols, regexps, lists, strings, vectors, ffs, special words, etc. S-expression is a lisp element, so can be tested with integer?, string?, bytevector?, etc. functions.

internals

Every parser is a (lambda (l r p ok) ...).

  • l - left part of stream
  • r - right part of stream
  • p - position in the stream (just to indicate error in stream if exist)
  • ok - (lambda (l r p result) ...)
    • l - left part of stream
    • r - right part of stream
    • p - position in the stream
    • result - parsed part of stream
Clone this wiki locally