- Packaging / infrastructure improvements:
- npm package
kaitai-struct-compiler
now returns the compiler object itself instead of a constructor function (calledKaitaiStructCompiler
). Make sure to adapt your code: replace(new KaitaiStructCompiler()).compile(...)
withKaitaiStructCompiler.compile(...)
(#222)
- npm package
- General compilation improvements:
- Prevent referring to non-existent enum members as
my_enum::
(8dcd1be)unknown_member - Prevent duplicate member names in enum definition (1cbaff9) - they're incompatible with the concept of enum in all target languages
- Ensure that IDs of
params
are unique and don't collide withseq
fields orinstances
within a type (#923) - Allow whitespace in type invocation: even
type: ' nested :: type ( 1 + 2 , data ) '
now works (#792) - Add style warnings reporting non-standard names for size fields (should use
len_
+ subject) and repeat count fields (should usenum_
+ subject) - see style guide- they are only recommendations and don't prevent compilation
- only available in the command-line
kaitai-struct-compiler
on the JVM platform (not in the Web IDE or in the JavaScript build at npm)
- Add the ability to report multiple problems at once instead of stopping after the first error - used for "type validation" errors and style warnings for now (only on JVM compiler builds, not JS builds)
- Improve readability of problems listed in the compiler output
- Force UTF-8 as output encoding in generated files (don't rely on system defaults)
--ksc-json-output
: addwarnings
at the same level aserrors
, don't use octal escapes (e.g. "" ⟶ "\274
\u00bc
") in string values (invalid in JSON)- Use SnakeYAML (the YAML parser used by JVM compiler builds)
1.25⟶ 1.28, which no longer contains the DoS vulnerability allowing a "billion laughs" attack (50f80d7)
- Prevent referring to non-existent enum members as
- Runtime API changes:
- C++:
kstream::to_string
now works for all integer types up to 64 bits (not justint
as before), has better performance and portability (cpp_stl#50) - Go:
ReadBitsInt{Be,Le}
now accept the number of bits as⟶uint8
int
(go@a5c5c1e) - Java:
readBytesTerm
,processXor
now accept a single byte value as⟶int
byte
- JavaScript: update UMD envelopes to support Web Workers and modules (in the runtime library, generated parsers and JS compiler builds)
- JavaScript:
readBitsInt{Be,Le}
now throw⟶Error
RangeError
when trying to read more than 32 bits - Lua: add zzlib as a submodule to support
process: zlib
- Python: validation errors now extend
⟶BaseException
Exception
for easier catching (python#53) - Python: add
API_VERSION
tuple used by generated modules to check their compatibility with the runtime library (python#49)
- C++:
- Notable improvements:
- Make methods
read_bits_int_{be,le}
for reading bit integers reliable (fix all bugs) and faster (#949) - No longer preallocating arrays to the capacity of
repeat-expr
entries, which could cause excessive memory allocations in invalid files (f5fe28e) - Fix
valid
(andcontents
) on unnamedseq
fields (forcontents
, this was a 0.9 regression: #825) - Construct: add support for enums
- Go: implement
encoding: UTF-16{BE,LE}
- Go, Lua: implement
valid/expr
(#435) - Java: fix broken parse
instances
on Java 7 and 8 when using prebuiltio.kaitai:kaitai-struct-runtime:0.9
from Maven Central (java#34) - Java: fix
terminator
values from0x80
to0xff
(java#35) - Lua: map 1-bit
type: b1
to boolean to match Kaitai Struct design (see docs) - Lua: fix undecided calculated endianness incorrectly treated as big-endian
- Lua: implement
process: zlib
(see Installation section of Lua runtime for how to enablezlib
support) - Nim: fix
encoding: ASCII
on Windows (#960) - Perl: fix array literals, implement all byte array operations,
substring
andstr.to_i(2)
methods - PHP: support PHP 8 (php#8)
- Python: generated parsers no longer import
pkg_resources
, which caused performance and usability issues (#804) - the runtime library API version check now compares tuples instead - Python:
read_bytes
checks if a large read request (8 MiB or more) can be satisfied, even before any bytes are read (python#61) - Ruby: validation error messages now display byte arrays as hex dumps, similar to Java (ruby#4)
- (Java - already in 0.9), Lua, PHP: fix translation of unsigned 64-bit integer literals - i.e. from
2**63 = 0x8000_0000_0000_0000
to2**64 - 1 = 0xffff_ffff_ffff_ffff
(fd7f308, Lua: #837)- these languages don't have actual 64-bit unsigned integers, but they do have 64-bit signed integers, so the result will be negative, but all 64 bits of precision will be preserved
- Fix translation of integer
-2**63 = -0x8000_0000_0000_0000
(e33828a)
- Make methods
- Generated code style improvements:
- Go: change header comment to match Go conventions for generated sources (#847)
- Lua: fix broken indentation after a
repeat: until
field - Python: simpler
return
statements in instance getters
- Infrastructure updates:
- Bintray was sunset on 2021-05-02: move stable compiler artifacts to GitHub Releases in the kaitai_struct_compiler repo
- Web IDE: improve error reporting (no more useless stack traces)
- https://formats.kaitai.io/: add pointers to runtime installation (#571)
- https://ci.kaitai.io/: group columns by language for better usability (#823)
- New targets support:
- Python with Construct library
- HTML - intended for documentation, preliminary support
- Nim - entry-level support (51% tests pass score)
- New KSY language features:
doc-ref
supports list of references (#269)meta/tags
allows specification of multiple tags to allow better navigation in the format gallery (#572)- Allow accessing nested types using
::
syntax:foo::bar
(#275) - Implement parsed data validations using
valid
key (#435) - Implement compile-time
sizeof
andbitsizeof
operators (#84)- Type-based:
sizeof<u4>
,bitsizeof<b13>
,sizeof<user_type>
- Value-based:
file_header._sizeof
(file_header
is a field defined in the current type)
- Type-based:
- Implement little-endian bit-sized integers (docs)
- Support choosing endianness using
le
/be
suffix:type: b12le
,type: b1be
- Add
meta/bit-endian
key for selecting default bit endianness (le
/be
)
- Support choosing endianness using
- Expression language:
- General compilation improvements:
- Support Maven-like directory trees by not adding subdir
src
for outputs of Go+Java anymore, see #287. While this breaks existing builds most likely, it puts those languages in line with all others and adding subdirs is easier for the user than removing some added by Kaitai automatically. - Better error messages (#488)
- Support for .ksy files with UTF-8 BOM (#499)
- Error messages are routed to stderr rather than stdout (#509)
--debug
mode split into--no-auto-read
and--read-pos
(#332)- C++: add C++11 mode
- Add
--cpp-standard
CLI option: pass--cpp-standard 11
to enable C++11 mode (98
is default) - C++11 target:
- uses
#pragma once
(instead of#ifndef FOO_H_
header guards) - uses
std::unique_ptr<foo>
for owning pointers, raw pointersfoo*
for non-owning - supports array literals
- uses
- Add
--no-auto-read
implemented for C++- C++: official Windows and Visual C++ support
- Fix case conversions to be locale-independent (#708)
- Support Maven-like directory trees by not adding subdir
- Runtime API changes:
- Add exceptions
Validation{Not{Equal,AnyOf},{Less,Greater}Than,Expr}Error
inheriting from common ancestorValidationFailedError
- thrown on failed validations defined withvalid
orcontents
key (#435) - Add method
read_bits_int_le
for parsing little-endian bit-sized integers (docs) - Deprecated classes and methods:
⟶ explicitensure_fixed_contents
if
that assertsreadBytes(n)
to be equal to the expectedn
-byte array (throwingValidationNotEqualError
if it fails)⟶UnexpectedDataError
ValidationNotEqualError
⟶read_bits_int
read_bits_int_be
- Add exceptions
- Major bugfixes:
params/type
- add support for:- specific user types
enum
types (#413)- byte arrays (
bytes
) - arrays (
u2[]
,struct[]
, etc.)
enum
with undefined values in enum list never crashes a parser (#523 for Python, #300 for Java)- Fix coercing different string/bytearray/enum/boolean types (e.g. parsed from stream and created from literal value) in conditional op (
? :
) or array literal - Substring
not
cannot be used in expressions (#556) - Bit-sized integers were not accounted for properly in
repeat: eos
(#548) - Fix switching with else case (
_: foo
) only (#595) - C++: fix all known memory leaks
- C++: fix absolute imports (#794)
- Java: more consistent closure of underlying IO streams on forced
close()
(#497) - Java: fix reading user types in type-switching in
--no-auto-read
mode (#204) - Python: work around circular dependencies generation
- PHP: fix invalid
namespace
declarations when no--php-namespace
specified (#637)
- Tooling around the compiler updates:
- Kaitai Struct compiler available as Maven plugin and as Gradle plugin
- Infrastructure updates:
- Unstable binary builds are available for all platforms after every CI build at Bintray (#63)
- KSY language reference replaced with documentation generated from JSON schema
- https://formats.kaitai.io/ is rebuilt automatically with CI/CD
- Brand new modular CI/CD system for compiler, underlying CI-agnostic, working on multiple different OSes in parallel (Linux, Windows, macOS) and showing status at https://ci.kaitai.io/
- Generate test assertion specs from language-agnostic KST specs
- New target languages:
- Lua (96% tests pass score)
- initial support for Go (15% tests pass score)
- New ksy features:
- Switchable default endianness:
meta/endian
can now contain a switch-like structure (withswitch-on
andcases
), akin to switchable types (docs). - Parametric user-defined types: one can use
type: my_type(arg1, arg2, arg3)
to pass arguments into user type (docs). - Custom processing types: one can use
process: my_process_name(arg1, arg2, arg3)
to invoke custom processing routine, implemented in imperative language (docs). - In repetitions, index of current repetition can be accessed using
_index
in expressions (docs). - Verbose enums: now one can specify documentation and other useful information relevant to enums using verbose enum declaration format (docs).
meta/xref
key can be used for adding cross-references of a format specifications (like relevant RFC entries, Wikidata entries, ISO / IEEE / JIS / DIN / GOST standard numbers, PRONOM identifiers, etc).
- Switchable default endianness:
- General compilation improvements:
- Imports/includes for all languages are now managed properly, no duplicate / unnecessary imports should be added
- Python: basic docstring support
- More strict ksy precompile checks (less likely to accept ksy that will result in non-compilable code), better error messages
- CLI options:
- Python target now allows to specify package with
--python-package
- Java target now allows custom KaitaiStream implementations and
thus allows to specify default implementation for
fromFile(...)
using--java-from-file-class
.
- Python target now allows to specify package with
- Expression language:
- New methods:
- floats:
to_i
- arrays:
min
,max
- floats:
- Added byte array comparison
- New methods:
- Packaging / infrastructure improvements:
- ksc is now available as npm package, which now a build dependency of a web IDE
- Runtime API changes:
- C++: now requires
KS_STR_ENCODING_ICONV
orKS_STR_ENCODING_NONE
to be defined to how to handle string encodings - Java:
KaitaiStream
is now an interface, and there are two distinct classes which implement it:ByteBufferKaitaiStream
provides KaitaiStream backedByteBuffer
(and thus using memory-mapped files)RandomAccessFileKaitaiStream
provides KaitaiStream backed byRandomAccessFile
(and thus uses normal OS read calls, as it was done in older KaitaiStruct circa v0.5)
- JavaScript: Error classes are now subclasses of
KaitaiStream
and were renamed in the following way:KaitaiUnexpectedDataError
->KaitaiStream
.UnexpectedDataError
- C++: now requires
- Major bugfixes:
- C++: adjusted to made compatible with OS X and Windows MSVC builds
- Fixed broken generation of byte array literals with high 8-bit set in some targets
- Fixed float literals parsing, fixed larger integer keys YAML parsing
- Fixed inconsistency of debug mode vs non-debug mode behavior for
repeat-*
- Fixed chain of relative imports bug: now all relative imports work always relative to the file being processed, not to current compiler's dir
- Many problems with switching: invalid common type inferring,
invalid code being generated, added failsafe
if
-based implementations for languages which do not support switching over all possible types. - Fixed most memory leaks in C++ (only exception-related leaks are left now)
- New ksy features:
- Type importing system:
meta/imports
can be used to import other types as first-class citizens in current compilation unit; "opaque types" are now disabled by default (see below) - Byte-terminated notation (
terminator
,include
andconsume
) can be now used not only for strings, but also for any byte types and user types pad-right
to remove declare excess right padding (usually with 0s)- User types can now use
parent: expression
to enforce a specific parent for an object, orparent: false
to disable parenting at all (and, subsequently, remove it from parent type inferring process) - Type inferring: value instances are now allowed to use
_parent
doc-ref
to add references to external documentation for types / attributes
- Type importing system:
- Improved compilation process:
- Compilation is now clearly separated in 3 phases: YAML parsing, precompilation, compilation. Phases 1 and 2 are language-agnostic and "precompilation" now does all possible sanity checks preliminary, making sure that language-specific "compilation" doesn't have to deal with invalid data.
- Improved compilation results reporting: now all error messages reported by compiler have file / code location and proper user-readable text. Added more than 50 tests for erroneous input files. Exceptions thrown directly are considered a compiler bug from now on.
- Generated code now checks for runtime library version compatibility and fails to compile / run with non-compliant runtime
- Command-line compiler options:
--opaque-types=true
to enable opaque types (disabled by default, i.e. using unknown type would be treated as error)--verbose
now allows fine-tuned verbose logging for various compiler's subsystems; using--verbose=all
exposes a lot of internal logic.--ksc-json-output
to dump compilation results in machine-readable JSON format (simplifies ksc integration in other tools, like visualizers)
- Console visualizer: faster loading, automatic handling of imports (no more need to specify all .ksy files manually on invocation)
- Expression language:
- Two string types: single quotes (verbatim), double quotes (interpolating with escape characters)
- New type casting operator:
.as<foo>
- New methods:
- arrays:
size
- booleans:
to_i
- byte arrays:
to_s(encoding)
- enums:
to_i
- strings:
reverse
- arrays:
- Runtime API changes:
- All bytearray to string functions are named
bytes_to_str
in all languages - Added
read_bytes_term
(akin to whatread_str_term
did previously to strings) - Removed
read_str_*
methods, they are to be replaced now with combination ofread_bytes_*
+bytes_to_str
- Added
bytes_strip_right
andbytes_terminate
- Perl module now uses
IO::KaitaiStruct
package name (instead ofKaitai
)
- All bytearray to string functions are named
- Major bugfixes:
- Recursive top-level types
- Unaligned bits reading with enums on top of bit-level integers
repeat-until
handling with substreams
- Unaligned bit parsing support
- Use
type: b12
to parse 12 bits as integer from a stream (obviously, one can useb1
,b2
,b3
, etc) b1
is parsed as a boolean value- If several
bXX
are chained in a sequence, can be used to parse bit masks/fields - Using of regular types (i.e.
u1
,s4
,str
, etc) starts parsing normally, aligning to next byte
- Use
- More meta information, documentation and non-standard keys usage:
doc
for docstrings is allowed on type levelmeta
can now include:title
(to give proper full title for type)license
(to specify work licensing)ks-version
(to specify minimal version of Kaitai Struct compiler that must be used to process a .ksy - i.e.0.6
)ks-debug
(to enforce generation of classes as if--debug
mode was specified in command line)
meta
is non-global now, but can be used on multiple levels and inherited from closest one- Non-(yet)-standard keys can be used everywhere now using
-key
syntax: for example, Web IDE uses-webide-representation
key which is ignored by the compiler, but useful for clearer debugging
- Enums are proper first-class citizens now:
enum: XXX
specifications are not just strings, but proper references to declared enums, thus they're checked for validity, can reference upper level nested enums from lower levels, etc - this fixes majority of existing enum namespacing problems in JavaScript, Python, PHP and Perl id
inseq
elements in now optional: it can be useful for quick exploration mapping (one can always assign identifiers later), or for unused ("reserved for later use") attributes - such attributes would be assigned numbered IDs automatically- Allow value instances to use
if
andenum
- Proper support for "opaque" external types: one can use an undeclared data type, it's expected to be declared in some other .ksy file and it will be properly imported/included in current file
- Expression language:
- Support for integer literals with underscores for readability: one can use stuff like
123_456_789
or0b0101_0011
now to_s
method for integer types to convert them to strings
- Support for integer literals with underscores for readability: one can use stuff like
- Language-specific improvements:
- C++: clearly separated "null" (no result, for example, due to failed
if
condition) and "not yet calculated" results - introduced_is_null_XXX()
method for check for true null result in generated API - JavaScript: generated enums can be queried for both ID => name and name => ID
- PHP: dropped type generation for now due to nullable types - one day they might return strictly for PHP 7.1+
- GraphViz: major compatibility fixes, diagram readability improvements, support for switch types
- C++: clearly separated "null" (no result, for example, due to failed
- Runtime API changes:
ensure_fixed_contents
no longer requires both expected byte array and its length, only array is required- Java: all methods no longer use checked exceptions, i.e.
IOException
- Bugfixes:
- Type derivation of parent types when using switched
type
, array types, and type combining on switching / ternary operators - Multiple translator fixes: type derivation, parenthesis generation
- Assorted code generation bugfixes in C++, Python, Ruby
- Type derivation of parent types when using switched
- Refactorings and optimizations:
- Type derivation engine
- Parse instances use more optimal order of conditionals / debug / IO management applications
- Improved error messages
- Target languages support:
- C++/STL - fully supported, all tests pass
- Python - made compiled code and runtime compatible with both Python 2 and 3, enforced by CI
- PHP7 - new target language, 98% supported
- Perl - new target language, 85% supported
- Graphviz - allows generation of visual diagrams of data formats, to be laid out with GraphViz (
.dot
format)
- New KSY language features:
- Switch-like conditional structure to determine
type
based on value of expression (instead of tons ofif
s) - Attribute field
doc
to annotate fields - will generate docstrings relevant to language (i.e. JavaDoc, JSDoc, YARD/RDoc, etc) repeat-until
allows repetition of a field until a condition is met- Boolean type support
- Switch-like conditional structure to determine
- Expression language:
_io.eof
returns boolean value - whether the end of stream was reached or not_io.pos
returns current position in the stream_io.size
returns size of the stream
- .ksy parsing improvements:
- New unified type derivation engine allows compile-time type error checks and full support of target languages which require absolute type designations (like C++, Python, Perl or PHP)
- Same YAML parsing code is now used for both JVM and JS platforms
- Stricter checks on all parsing stages: lots of invalid combinations are now prohibited (instead of choosing one of variants)
- Better error messages: now in most cases compiler clearly indicates source of the problem
- Build and release process:
- Compiler: added building as pom module
- Java runtime: added building as pom module
- Python runtime: added building as pip module
- Windows CI: now all commits are built also on Windows, with .msi package available for download
- Debug mode:
- Support implemented for Java and JavaScript (to allow creation of visualizer tools in these languages - see Java GUI for Kaitai Struct and Web IDE for Kaitai Struct
- Added generation of
SEQ_FIELDS
helper const array that allows clear separation of sequence attributes vs instance without guesswork - Exception in debug mode now tries to save as much parsed data as possible (to aid diagnosing the error)
- Incompatible changes:
- Identifiers are now strictly checked to conform to
lower_underscore_case
pattern (that would be converted to language-specific style on ouput) - Java:
_parse
method renamed to_read
process*
methods are now static
- JavaScript:
position
in runtime is renamed topos
(to conform to general KS API spec) - Compiler API: now all compilers accept unified
RuntimeConfig
for configuration instead of individual options
- Identifiers are now strictly checked to conform to
- Bugfixes:
- Java:
- having
if
on a sequence attribute now makes it automatically boxed (to allow it to benull
) - work around some
int
vslong
incompatibilities - proper boxing of floating types
- having
- Integer modulo (
%
) operation now behaves exactly the same in all languages, always returning positive result (as opposed to remainder operation%
in languages like C++ or Java)
- Java:
- Languages support:
- New target language, fully supported: C# (modules should be usable all across the .NET platform, i.e. from C++/CLI, VB.NET, F#, etc.)
- Preliminary support for C++ (with STL containers / IO implementation) - note that not all features are implemented.
- Data types:
- Floating point data types support (available as
f4
andf8
for single and double precision IEEE754 floats) - Separate data type for byte arrays (including support for literal byte arrays)
- Floating point data types support (available as
- Expressions language:
- Added new testing framework for expression translators
- Added
.first
and.last
for arrays (getting first and last element of array) - Added
.to_i
for strings (string -> int conversion) - Support for accessing
_io
object (IO stream) to access current stream's size (_io.size
)
- Processing: extended "xor" processing to support XORing with multi-byte keys
- Runtime libraries:
- Lots of cleanup - now all libraries try to follow the same strict standard (with method naming, parameters, order of methods, etc).
- JavaScript: implemented full streaming API (both signed & unsigned integer, ensuring fixed contents fields, approximated 64-bit integers, etc).
- New process: "ror/rol" (for simple circular bit shift)
- Ruby: runtime classes reside in a proper namespace:
Kaitai::Struct::Struct
andKaitai::Struct::Stream
, now justKaitaiStruct
andKaitaiStream
- Scala.js build: fully implemented, now compiler can be called on a web page as a JavaScript library
- Implemented
process:
for pre-processing input buffer of user types - Translator: allow coercing of different int types into each other
- General code cleanup
- Initial public release