Skip to content

Metrics

jrte edited this page Oct 15, 2023 · 23 revisions

Winding down

The throughput and trap charts below present benchmarking metrics with (ribose@7ac8939) after some recent (Fall 2023) performance tuning. This work was done with Java 17 and was started after finding no significant changes in ribose runtime characteristics migrating from Java 11 to 17. The transducer matrix compression and heap metrics presented in the other charts below are largely unaffected by throughput metrics and haven't changed significantly since Spring 2023 when benchmarking data were collected for Optimize looping in self-referencing states. These earlier benchmarks were run with Java 11 (ribose@6fa9a23).

Transduction throughput

throughput

Benchmarking focusses on two data sets. One is a Linux kernel log file with mixed entries, including iptables entries which are the features of interest. The other is an IBM Java GC log with mixed XML stanzas including generational collection start stanzas. The former is dense with features of interest, and each feature is dense with data (38% of input is extracted). The latter is more sparsely populated with features of interest, and each feature contains only one field of interest (<1% of input is extracted). Both input files are ~20MB in size.

For the Linux kernel log there are 4 ribose transducers and one Java regex, each producing identical output.

  • LinuxKernel: single-pass transduction extracting several fields from each iptables entry.
  • LinuxKernelLoose: partial lookahead to identify iptables entry followed by single-pass transduction.
  • LinuxKernelStrict: full lookahead to identify iptables entry followed by single-pass transduction.
  • LinuxKernelNil: single-pass transduction locating fields of interest without extraction.
  • Regex.LinuxKernel: Java regex equivalent to LinuxKernel*.

For the GC log there are 2 ribose transducers and one Java regex, each producing identical output.

  • Sintervals: coarse-grained identification of GC cycle start stanzas and extraction of one attribute value.
  • Tintervals: fine-grained identification of GC cycle start stanzas and extraction of one attribute value.
  • Regex.VerboseGC: Java regex equivalent of *intervals.

Throughput benchmarking involves running each transducer 20 times in series with input data in RAM and output muted. The first 10 runs are warmup runs, the last 10 run times are used to calculate the average throughput, proportions of input caught in sum, product or scan traps, and number of nul events per kilobyte of input. A nul signal is injected when no transition is defined for current state and input token, giving the current transducer an opportunity to enter a synchronizing pattern that consumes input with nil effect up to the end of the innermost enclosing loop. The ribose transducers involved here are conditioned to accept nul anywhere and synchronize at the head of the next potential feature, so any input from byte* is acceptable but only recognizable features will produce any effect.

Input traps

traps

There are 3 input traps, which bypass the transition matrix for certain subsequences of input.

  • sum trap: emulates a state with many idempotent transitions ([input, state] -> [state, nil]) until an input byte forces state change or non-nil effect.
  • product trap: emulates an acyclic series of >3 states each with only one nil exit transition (a non-branching input vector with nil effect).
  • scan trap: specializes the sum trap in cases where there is only one input that escapes the state.

The sum, product and scan traps are controlled by private effectors inserted into the transducer matrix and effect vectors by the model compiler. The sum and scan traps are particularly useful in error recovery contexts. The relatively high throughput seen with the *intervals transducers is obtained in scan traps searching for the next opening '<'. The product trap eliminates intermediate states and reduces the granularity of the input equivalence relation and so compresses the transition matrix in both dimensions (reduced number of states and input equivalence classes).

Transducer matrix compression

compression

The automata that ginr produces from ribose transducer patterns are state-minimized deterministic finite automata with 3 tapes (input, effector, parameter) representing subsequential transducers. Each transition is associated with an effect, which may be a scalar simple or parameterized effector or a nul-terminated vector of effectors and associated parameters. The latter are represented in ginr automata as chains of non-branching state transitions that the ribose model compiler extracts as a single action referring to an effect vector. This is the first stage of matrix compression. Some groups of byte and signal inputs may behave identically in all states; for some transducers a digit is just a digit, everywhere. So the transition matrix induces an equivalence relation on byte|signal and equivalence class ordinals can stand in for inputs in the transducer matrix. This is the second stage of compression.

compression

The ribose model compiler enumerates all maximal non-branching chains of input state transitions in the transducer matrix, similarly as for effect vectors. These chains are diverted to product traps, which eliminate the intermediate states and erase the unique character of the involved bytes, leading to a more coarse-grained input equivalence relation. This final stage of compression reduces the transducer matrix in both dimensions and improves transition density (transitions / (states * input equivalents)), thereby minimizing the representation of the transducer matrix in RAM and reducing the likelihood of L1 cache misses while running transductions. The net result is lossless decomposition of the transducer matrix to an integer array mapping byte|signal → input equivalent, a compressed state transition matrix mapping input equivalent → effect and an integer array mapping effect → effector vector (an indexed concatenation of nul-terminated effect vectors).

Heap usage

heap

Garbage collection metrics are logged for output equivalence tests. Output is written through a buffered output stream to System.out and tested for binary equivalence. Java regex requires that the entire input corpus be presented as a single unit so there is an up front 60MB heap charge to read the raw UTF8 input into a 20MB byte array and decode it into a 40MB char array to serve to the regex API. The ribose output equivalence transductions incur the 60MB charge identically as above but do not use these arrays; ribose input is presented in 64kb segments as read from the raw input file and transduced as a UTF8 byte stream without decoding to UNICODE. In all cases this 60MB charge persists for the duration of the benchmarking run. The above charts reflect the volume of transient objects instantiated and discarded during benchmark runs.

Heap usage is a strong differentiator for the ribose runtime. Data are extracted to a finite set of byte accumulators. These are analogous to registers in a von Neumann CPU - they are persistent while their values are transient. Extracted data must be output or assimilated into the target model immediately as it becomes available. As a consequence, ribose transductions typically operate with very low heap resources although ribose byte accumulators are allowed to grow without limit when required. For the kernel log runs the ribose heap grows from an initial heap size of 2MB to a maximal extent of 64MB while the regex heap grows to 76MB. In both cases 7.4MB of identical data are extracted to the output stream. The verbose GC log runs (regex and ribose) extract only 61kb to output and run in a 62MB heap.

Clone this wiki locally