You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, the state of the stream provided by each of our runtime libraries after a read/write operation fails with an EOF error is effectively implementation-defined. This is bad because it may easily differ between different languages and even stream implementations in one language (i.e. ByteBufferKaitaiStream and RandomAccessFileKaitaiStream in Java) and it cannot be relied upon - it's enough if we change the implementation of some method in the runtime library in something that looks like an unimportant detail (for example, changing the order of operations that don't seem like they have to be done in one particular order), and the behavior changes.
Perhaps users don't usually care about the stream state after an EOF error because in their application it's game over and they won't touch the stream anymore. But even if they don't intend to do any more read/write operations but they only want to check the pos() to diagnose where in the stream the error occurred, for example, they should get stable and well-defined readings from pos(), not affected by the implementation details.
I think that the most understandable and convenient API will be such that after an EOF error, the state of the stream will be the same as before the failed read/write operation was called. This means that if the current pos() is 1 and you call read_bytes(4) that fails with an EOF error, the pos() will still be 1. Likewise, if you call any of the read_bits_int_*() or write_bits_int_*() methods and it fails with an EOF error, not only the pos() will remain the same as before the failed call but also any of the internal bit buffers (traditionally represented by bits_left and bits fields of KaitaiStream classes) will remain untouched.
Known instances of the implementation-specific behavior include:
At least in Python and Ruby the read_bytes method first calls the read() method of the underlying I/O stream, which actually doesn't throw any EOF errors itself - it just returns as many bytes are available in the stream. If less bytes were read than requested, our runtime libraries would deliberately throw an EOF error.
This is fine at first glance, but the problem is that since the underlying read() really reads everything until the end of stream if you wanted more bytes than available, the stream position is implicitly moved at EOF and then the EOF error is raised. This means that the read_bytes permanently mutated the stream state even though it failed with an exception.
See Ruby implementation of read_bytes for reference (struct.rb:372-381):
defread_bytes(n)r=@_io.read(n)ifrrl=r.bytesizeelserl=0endraiseEOFError.new("attempted to read #{n} bytes, got only #{rl}")ifrl < nrend
In Python since 0.10, the read_bytes actually checks whether a large read request (8 MiB or more) can be satisfied by comparing the requested size with size() - pos(). If it finds out that the requested cannot be satisfied, it doesn't call the read() at all and raises an EOF error straight away (see read_bytes(): add extra length check kaitai_struct_python_runtime#61) - so the pos() will remain where it was before calling read_bytes. However, this check is not done for small reads (< 8 MiB), so if a small read fails, the position will be at EOF. This makes the behavior of read_bytes in Python even more unpredictable (i.e. dependent on a rather arbitrary implementation-specific limit of 8 MiB) than in other languages.
In fact all runtime libraries use an identical algorithm for read_bits_int_{be,le}() methods, and at least in the read_bits_int_be variant, the bits_left field is set before read_bytes is called (which might fail, though, and in that case the bits_left is not restored), but bits is set only after the call to read_bytes (so they may be even inconsistent with each other after the failed call). See for example the Python implementation of read_bits_int_be (kaitaistruct.py:233-259):
defread_bits_int_be(self, n):
res=0bits_needed=n-self.bits_leftself.bits_left=-bits_needed%8ifbits_needed>0:
# ...buf=self.read_bytes(bytes_needed)
# ...else:
# ...mask= (1<<self.bits_left) -1# `bits_left` is in range 0..7self.bits&=maskreturnres
The text was updated successfully, but these errors were encountered:
This applies to runtimes in other languages too, because their size() methods often use the same approach - it's just that the problem is more visible in Go because of the explicit error handling than in other languages that use exceptions.
Right now, the state of the stream provided by each of our runtime libraries after a read/write operation fails with an EOF error is effectively implementation-defined. This is bad because it may easily differ between different languages and even stream implementations in one language (i.e.
ByteBufferKaitaiStream
andRandomAccessFileKaitaiStream
in Java) and it cannot be relied upon - it's enough if we change the implementation of some method in the runtime library in something that looks like an unimportant detail (for example, changing the order of operations that don't seem like they have to be done in one particular order), and the behavior changes.Perhaps users don't usually care about the stream state after an EOF error because in their application it's game over and they won't touch the stream anymore. But even if they don't intend to do any more read/write operations but they only want to check the
pos()
to diagnose where in the stream the error occurred, for example, they should get stable and well-defined readings frompos()
, not affected by the implementation details.I think that the most understandable and convenient API will be such that after an EOF error, the state of the stream will be the same as before the failed read/write operation was called. This means that if the current
pos()
is1
and you callread_bytes(4)
that fails with an EOF error, thepos()
will still be1
. Likewise, if you call any of theread_bits_int_*()
orwrite_bits_int_*()
methods and it fails with an EOF error, not only thepos()
will remain the same as before the failed call but also any of the internal bit buffers (traditionally represented bybits_left
andbits
fields ofKaitaiStream
classes) will remain untouched.Known instances of the implementation-specific behavior include:
At least in Python and Ruby the
read_bytes
method first calls theread()
method of the underlying I/O stream, which actually doesn't throw any EOF errors itself - it just returns as many bytes are available in the stream. If less bytes were read than requested, our runtime libraries would deliberately throw an EOF error.This is fine at first glance, but the problem is that since the underlying
read()
really reads everything until the end of stream if you wanted more bytes than available, the stream position is implicitly moved at EOF and then the EOF error is raised. This means that theread_bytes
permanently mutated the stream state even though it failed with an exception.See Ruby implementation of
read_bytes
for reference (struct.rb:372-381
):In Python since 0.10, the
read_bytes
actually checks whether a large read request (8 MiB or more) can be satisfied by comparing the requested size withsize() - pos()
. If it finds out that the requested cannot be satisfied, it doesn't call theread()
at all and raises an EOF error straight away (see read_bytes(): add extra length check kaitai_struct_python_runtime#61) - so thepos()
will remain where it was before callingread_bytes
. However, this check is not done for small reads (< 8 MiB), so if a small read fails, the position will be at EOF. This makes the behavior ofread_bytes
in Python even more unpredictable (i.e. dependent on a rather arbitrary implementation-specific limit of 8 MiB) than in other languages.In fact all runtime libraries use an identical algorithm for
read_bits_int_{be,le}()
methods, and at least in theread_bits_int_be
variant, thebits_left
field is set beforeread_bytes
is called (which might fail, though, and in that case thebits_left
is not restored), butbits
is set only after the call toread_bytes
(so they may be even inconsistent with each other after the failed call). See for example the Python implementation ofread_bits_int_be
(kaitaistruct.py:233-259
):The text was updated successfully, but these errors were encountered: