-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c++ wrapper class for binary_fuse(8|16)_t
, plus a "sharded" filter with mmap'ed file format
#60
Comments
It looks good! |
Thank you I have listed out some specific API question, which I have doubts about. If you have a view on these, that would be helpful. |
UPDATES.. Done some more work and thinking around this. I am comfortable with finding solutions for 1-5 from oschonrock/binfuse#2 @lemire if you could comment on 6.:
since i know little about that, and possibly 7.
|
For small instances with few elements, XOR filters might be slightly smaller. However, this advantage is less significant since these filters are primarily useful for handling large datasets. If you're dealing with just 32 elements, for example, an XOR filter would still be my choice. Nonetheless, there's a crossover point at which XOR filters become the more compact option. Obviously, you can make this transparent for the user, and just switch filter type based on the size, but we never implemented that. I am not sure it matters.
Node.js, Chrome and many important systems have switched to C++20. I recommend C++20. |
OK, I am fairly happy with the library as a first public cut now. I have written some content for the README. You Ok, with how that links back to here? https://github.com/oschonrock/binfuse I am currently working on some benchmarks. |
That's fine. There is no requirements regarding links. It is good to refer the users to the paper! |
Thanks for the endorsement in the readme. I have some benchmark results which you may find interesting:
Also refer to notes on memory consumption during querying via mmap. May be obvious? |
Having built up this wrapper class for my hibp project, it turned out non-trivial and I thought potentially useful for others, so I extracted into a separate library project:
https://github.com/oschonrock/binfuse
example use case of the sharded filter from one of the tests:
By default the
binfuse::sharded_filter
breaks the input stream (which MUST be sorted) into 256 shards (adjustable on construction) and saves the resulting filter into a custom, tagged binary format. Fromsharded_filter.hpp
:It uses the
binfuse::filter
class internally to wrap each 8 or 16 bit filter.The singular
binfuse::filter
s cannot be saved to disk on their own right now, as I didn't need it, but that may be useful?What do you think about this API, and what is missing, for it to be useful?
The text was updated successfully, but these errors were encountered: