bfGo
is a library written in Golang to provide end user/application to spin up bloom filters from scratch. It provides configurable parameters and multiple variants such as:
- Standard Bloom Filter
- SBF relies on underlying bitset as a datastore and provisions Insertion and Membership verification of inputs
- Counting Bloom Filter
- CountingBF relies on underlying frequencyHashmap as a datastore and provisions Insertion, Membership verification and deletion of inputs
- Cuckoo Bloom Filter
- CuckooBF relies on underlying hashtables and slots as a datastore and provisions Insertion, Membership verification, Deletion and other auxilary operations
In addition, bfGo has Hashing
, Indexing
and Compression
modules that aid the lifecyle of an input in bloom filters. bfGo uses consistent hashing algorithms are deterministic such as Fowler-Noll–Vo hash(Fnv1a) and Murmum3 algorithm, whereas Module Biasing is used for searching indexes and Cyclic Redundancy Checksum algorithm is used for compression, primarly for Cucoko filters.
Each of the filters have their own designed datastores which are compact and memory efficient, keeping footprint as minimal as possible. For more details on how these are designed, please redirect to /docs for a better understanding.
-
Use standard bloom filters when you need a space-efficient probabilistic data structure to test membership of an element in a set
- Target usecases: spell checkers, web caches, or distributed systems.
- Parameters:
- Insertion time and Membership verification is directly proportional to input size.
- Operation on the datastore runs on constant time
- If the expected inputs are huge in size, it would be recommended to use strong algorithms that are blazing fast to find hashes.
-
Use counting bloom filters when you need to support both insertion and deletion of elements while maintaining a compact data structure.
- Target usecases: network traffic monitoring, cache management, or stream processing.
- Parameters:
- Insertion time and Membership verification is directly proportional to input size.
- Operation on the datastore runs on constant time
- If the expected inputs are huge in size, it would be recommended to use strong algorithms that are blazing fast to find hashes.
-
Use cuckoo bloom filters when you need to handle high insertions rates and want to minimize false positives.
- Target usecases: such as network security, malware detection, or DNA sequence analysis.
- Parameters:
- Slot size has minimal impact in performance, but allocations and total insertion time is directly proportional to input size.
- Operation on the datastore runs on constant time
- Cuckoo filters generally work best when you have filter size is 10x or greater than the input size and count.
- If the expected inputs are huge in size, it would be recommended to use strong algorithms that are blazing fast to find hashes.
Each of the filters have been benchmarked and tested throughly to understand the performance and correlation of input size, filter size and hashing functions. The factors used to conduct the tests are :[INPUT SIZE/FILTER SIZE/SLOT SIZE] in the exact order.
Based on below benchmarks, it can be concluded that with ideal balance between filter, slot and input size, where filter size is always 10x or greater than the input size(which is not too large ie; <1000 bytes), would result in operations being executed in approximately:
6μs
(insertion) and1μs
(membership verification) in SBF7μs
(insertion) and1μs
(membership verification) in Counting BF6μs
(insertion) and0.5845μs
(membership verification) in Cuckoo BF
Benchmark | Operations | Time/op | Memory/op | Allocations/op |
---|---|---|---|---|
Standard BF Insert [1000/1000] | 168,452 | 6,729 ns | 1,144 B | 3 |
Standard BF Membership [1000/1000] | 926,133 | 1,255 ns | 120 B | 2 |
CBF Insert [1000/1000] | 155,170 | 7,119 ns | 1,168 B | 4 |
CBF Membership [1000/1000] | 900,470 | 1,339 ns | 144 B | 3 |
Cuckoo Filter Insert [1000/1000/10] | 156175 | 13091 ns/op | 1834 B/op | 51 |
Cuckoo Filter Insert [1000/100000/10] | 180880 | 6317 ns/op | 1040 B/op | 2 |
Cuckoo Filter Membership [1000/1000/10] | 2009852 | 584.1 ns/op | 16 B/op | 1 |
Cuckoo Filter Membership [1000/100000/10] | 2074994 | 584.5 ns/op | 16 B/op | 1 |
Cuckoo Filter Membership [1000/1000/1000] | 2082231 | 582.0 ns/op | 16 B/op | 1 |
Cuckoo Filter Membership [1000/1000/10000] | 2018246 | 570.1 ns/op | 16 B/op | 1 |
Contributions and collaborations are much welcomed! Please head over to the guidelines and raise a PR for the change.