Table of Contents
- TCOBS is a variant of COBS combined with real-time RLE data compression especially for short messages containing integers.
- The consistent overhead with TCOBS is 1 byte for each starting 31 bytes in the worst case, when no compression is possible. (Example: A 1000 bytes buffer can be encoded with max 33 additional bytes.) This is more compared to the original COBS with +1 byte for each starting 254 bytes, but if the data contain integer numbers, as communication packets often do, the encoded data will be statistically shorter with TCOBS compared to the legacy COBS.
- Most messages like Trices consist of 16 or less bytes.
- Some messages or user data are longer.
- Several zeros in a row are a common pattern (example:
00 00 00 05
). - Several 0xFF in a row are a common pattern too (example -1 as 32 bit value).
- Maybe some other bytes appear also in a row.
- TCOBS does not know the inner data structure and is therefore usable on any user data.
- TCOBS was originally developed as an optional Trice part and that's the T is standing for. It aims to reduce the binary trice data together with framing in one step.
- T symbols also the joining of the 2 orthogonal tasks compression and framing.
- Additionally, the usage of ternary and quaternary numbers in TCOBSv2 is reflected in the letter T.
- TCOBSv2 is a better approach for TCOBSv1, suited perfect when long sequences of equal characters occur in the data stream.
- The TCOBSv1 compression is expected to be not that good as with TCOBSv2.
- About the data is assumed, that 00-bytes and FF-bytes occur a bit more often than other bytes.
- The compression aim is more to get a reasonable data reduction with minimal computing effort, than reducing to an absolute minimum. The method shown here simply counts repeated bytes and transforms them into shorter sequences. It works well also on very short messages, like 2 or 4 bytes and on very long buffers. The compressed buffer contains no 00-bytes anymore what is the aim of COBS.
- TCOBS is stand-alone usable in any project for package framing with data minimizing.
- Use cases in mind are speed, limited bandwidth and long time data recording in the field.
- TCOBS is inspired by rlercobs. The ending sigil byte idea comes from rCOBS. It allows a straight forward encoding avoiding lookahead and makes this way the embedded device code simpler.
- TCOBS uses various chained sigil bytes to achieve an additional lossless compression if possible.
- Each encoded package ends with a sigil byte.
0
is usable as delimiter byte between the packages containing no0
anymore. It is up to the user to insert the optional delimiters for framing after each or several packages.
- Usually it is better to divide this task and do compression and COBS encoding separately. This is good if size and time do not really matter.
- The for TCOBS expected messages are typically in the range of 2 to 300 bytes, but not limited, and a run-length encoding then makes sense for real-time compression.
- Separating compression and COBS costs more time (2 processing loops) and does not allow to squeeze out the last byte.
- With the TCOBS algorithm, in only one processing loop a smaller transfer packet size is expected, combined with more speed.
-
In case of data disruption, the receiver will wait for the next 0-delimiter byte. As a result it will get a packet start and end of 2 different packages A and Z.
-
For the decoder it makes no difference if the packages starts or ends with a sigil byte. In any case it will run into issues in such case with high probability and report a data disruption. But a false match is not excluded for 100%.
- If the decoded data are structured, one can estimate the false match probability and increase the safety with an additional package CRC before encoding, if needed.
-
The receiver calls continuously a
Read()
function. The received buffer can contain 0-delimited packages and the receiver assumes them all to be valid because there is no known significant time delay between package start and end. -
If a package start was received and the next package end reception is more than ~100ms away, a data disruption is likely and the receiver should ignore these data.
- Specify a maximum inter-byte delay inside a single package like ~50ms for example.
- To minimize the loss in case of data disruption, each message should get TCOBS encoded and 0-byte delimited separately.
- The more often 0-byte delimiters are increasing the transmit overhead a bit on the other hand.
-
Of course, when the receiver starts, the first buffer can contain broken TCOBS data, but we have to live with that on a PC. Anyway there is a reasonable likelihood that a data inconsistency is detected as explained.
- The TCOBSv1 & TCOBSv2 code is stable and ready to use without limitations.
Property | TCOBSv1 | TCOBSv2 |
---|---|---|
Code amount | 🟢 less | 🟡 more |
Speed assumption (not measured yet) | 🟢 faster | 🟢 fast |
Compression on short messages from 2 bytes length | 🟢 yes | 🟢 yes |
Compression on messages with many equal bytes in a row | 🟡 good | 🟢 better |
Encoding C language support | 🟢 yes | 🟢 yes |
Decoding C language support | 🟢 yes | 🟢 yes |
Encoding Go language support | 🟡 yes with CGO | 🟡 yes with CGO |
Decoding Go language support | 🟢 yes | 🟡 yes with CGO |
Other language support | 🆘 No | 🆘 No |
- Compression is a wide field and there is a lot of excellent code around.
- But when it comes to very short messages like up to 100 bytes these algorithms fail for one of two reasons:
- They rely on a case specific runtime generated dictionary, which must be packed into the compressed data as well.
- They rely on a common dictionary on encoder and decoder side which then is not needed to be a part of the compressed data. An interesting example is SMAZ. But this method is not usable on arbitrary data.
- If your packages contain many integers, they have statistically more 0xFF and 0x00 bytes: ✅ that is TCOBS is made for.
- If your packages contain many equal bytes in a row: ✅ that is TCOBS is made for.
- If your packages contain statistically mixed byte sequences, like encrypted data: 🛑 that is TCOBS is NOT made for. Such data you frame better simply with COBS, even it is possible with TCOBS. A compression may make sense before the encryption.
↩ See ./docs/TCOBSv1Specification.md.
↩ See ./docs/TCOBSv2Specification.md.
Name | Content |
---|---|
v1 | This is a pure Go TCOBSv1 package. Only decoding is supported. Usable in Go apps, which only need to decode. One example is trice. The Go files inside this folder are copied from Cv1. |
Cv1 | This is the with C-sources and tests extended v1 folder. It provides a TCOBSv1 Go encoding and decoding using CGO. The C-files are direct usable in an embedded project. |
Cv2 | Here are the TCOBSv2 C-sources usable in an embedded project. The Go-files are the CGO adaption and the Go tests for TCOBSv2. |
- Add
import "github.com/rokath/tcobs/v1"
to your go source file.- Use function
tcobs.Decode
OR - use function
tcobs.NewDecoder
and then methodRead
. Seeread_test.go
for an example.
- Use function
- Add
import "tcobs github.com/rokath/tcobs/Cv1"
orimport "tcobs github.com/rokath/tcobs/Cv2"
to your go source file.- Use functions
tcobs.CDecode
andtcobs.CEncode
OR - use functions
tcobs.NewDecoder
andtcobs.NewEncoder
and then methodsRead
andWrite
. Seeread_test.go
andwrite_test.go
for an example.
- Use functions
- Include the Cv1 or Cv2 C sources in your C project. Check
tcobsTest.c
for usage example.
- Add Changelog
- Add back to top links
- Add Go Reader & Writer interface
- Add generic CCTN & CCQN conversions to remove TCOBSv2 limitations.
- Improve testing with groups of equal bytes.
- Add fuzzing testing.
- Compare efficiency TCOBSv2 with TCOBSv1.
See the open issues for a full list of proposed features (and known issues).
❓ One could think that arbitrary byte buffer examples could be analyzed concerning the statistical usage of bytes and find that 0xFC...0xFF and 0x00...0x20 are used more often than 0xBD for example. This would allow to code some bytes with 5 bits and others with 11 bits creating a universal table, like huffman encoding. This table than is commonly used and the need to pack it into the compressed buffer disappears. Maybe some 2-byte sequences get also in this table and the table code could get enhanced with run-length codes.
❓ Several such "universal" tables are thinkable and during compression the encoder decides which "universal" table fits best for a specific short buffer. Then the table index must get into the compressed data.
❗ Because these "universal" tables then must reside together with the encoder and the decoder, this will increase the needed code space significantly. Alternatively these tables reside accessible outside the embedded device.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Working on your first Pull Request? You can learn how from this free series How to Contribute to an Open Source Project on GitHub
Distributed under the MIT License. See LICENSE.txt
for more information.
Thomas Höhenleitner - th@seerose.net Project Link: https://github.com/rokath/tcobs
Common Object Byte Stuffing with optimized Run-Length Encoding
Explore docs »
v1 Code
·
Cv1 Code
·
Cv2 Code
·
Report Bug / Request Feature
OR
OR