KBloom is a simple Bloom Filter library for Android developed with Kotlin. It helps you to efficiently check if an element might be in a set without storing all elements. Perfect for memory-constrained mobile devices!
- Core Bloom Filter:
- Basic insert(
put
,putAll
) and membership check(mightCotain
,mightContainAll
) - Uses
hashFuction: HashFuction
and a lambdatoBytes: (T) -> ByteArray
to manage any data typeT
- Provides a private constructor plus methods
create(...)
andrestore(...)
- Basic insert(
- Modular Architecture:
hahsing
module: Different hash functions(MurmurHash3, XxHash32)loggin
module: Pluggable loggers (ConsoleLogger
,NoOpLogger
)serialization
module: Serialize/deserialize to formats (ByteArraySerializer
,JsonSerializer
,MessagePackSerializer
)configuation
module: Includes a builder pattern for flexiable Bloom Filter setup
- Builder Pattern:
BloomFilterBuilder
allows settingexpectedInsertions
,fpp
,seed
, customhashFunctio
, ortoBytes
- Support stategies: OPTIMAL (automatic
k
calculation) or CUSTOM (manualexpectedInsertions
)
- Error Handling:
- Custom exceptions (
InvalidConfigurationException
,InvalidConfigurationException
) for invalid settings or bad serialized data
- Custom exceptions (
- Performance Enhancements:
- Option to use a
LongArrayBitArray
instead ofIntArray
for higher efficiency - Potential concurrency measures if needed
- Option to use a
- Additional Features:
- Estimate Current Number of Elements : Uses bit-count to approximate how many items the Bloom Filter already contains
- Probability Estimation : Calculates a current false positive rate based on how many bit are set
Scalable Bloom Filter (SBF)
Key Features:
- Automatic Expansion: When the current filter reaches its limit (based on a load factor or heuristic), a new child Bloom Filter is created to continue accommodating the elements.
- False Positive Control: Each child Bloom Filter can be individually configured (
bitSetSize
,number of hash functions
,fpp
) to maintain a consistent false positive rate. - Short-Circuit Evaluation: When checking for the existence of an element, the filters are traversed from the newest to the oldest, and the process stops as soon as a filter indicates that the element is not present.
- Self-Adjustment: Utilizes a
GrowthStrategy
(e.g., Default,Geometric
,Tightening
) to adjust the size and configuration of the new filter, aligning with the actual insertion load. - Serialization/Deserialization: Supports storing and restoring the complete state of the SBF, including the parameters of each child Bloom Filter and the information(
name
) on the GrowthStrategy.
Counting Bloom Filter (CBF)
- Counters Array Instead of Bit Array: Each position stores a counter value instead of just a
0/1
. - Support for Removing Elements: It allows decreasing the counter values to "
remove
" an element, which is not typically permitted in a standard Bloom Filter. - Counting Occurrences: Provides a count(
element
) function to estimate the number of times an element has been added, based on the minimum value among the counters at the hashed positions. - Memory Management & Overflows: It is necessary to define a maximum limit for each counter (
maxCounterValue
) to prevent overflow, which can affect the counting accuracy.
- Kotlin Multiplatform:
- Use KBloom on iOS and JVM platforms.
-
Add JitPack repository to your project-level
build.gradle
:allprojects { repositories { google() mavenCentral() maven { url 'https://jitpack.io' } } }
-
Add KBloom dependency to your app-level
build.gradle
:dependencies { implementation 'com.github.moclam1905:KBloom:1.3' }
val mockLogger = MockLogger() // for testing
val bf = BloomFilterBuilder<String>()
.expectedInsertions(1000)
.falsePositiveProbability(0.01)
.seed(42)
.hashFunction(MurmurHash3)
.toBytes { it.toByteArray(Charsets.UTF_8) }
.logger(mockLogger)
.build()
Note: See more about exaple toBytes
in BloomFilterGenericTypeTest.kt
// Add single elements
bloomFilter.put("apple")
bloomFilter.put("banana")
// Add multiple elements at once
val fruits = listOf("cherry", "date", "elderberry")
bloomFilter.putAll(fruits)
// Check single elements
val containsApple = bloomFilter.mightContain("apple") // true
val containsGrape = bloomFilter.mightContain("grape") // false (or true with low probability)
// Check multiple elements at once
val checkFruits = listOf("apple", "banana", "cherry")
val allContain = bloomFilter.mightContainAll(checkFruits) // true if all are present
val serializedJson = bf.serialize(SerializationFormat.JSON)
val bfDeserialized = BloomFilter.deserialize<String>(
byteArray = serializedJson,
format = SerializationFormat.JSON,
hashFunction = MurmurHash3,
toBytes = { it.toByteArray(Charsets.UTF_8) }
)
enum class SerializationFormat {
BYTE_ARRAY,
JSON,
MESSAGEPACK
// maybe support more serialization format in future
}
-testEstimateCurrentNumberOfElements
val elementsToAdd = listOf("apple", "banana", "cherry", "date", "elderberry")
elementsToAdd.forEach { bloomFilter.put(it) }
val estimatedElements = bloomFilter.estimateCurrentNumberOfElements()
// n ≈ -(m/k) * ln(1 - x/m)
val actualElements = elementsToAdd.size
// allowed tolerance is 2 elements
val tolerance = 2.0
// Check the difference between estimated and actual elements
assertTrue(
"Estimated elements ($estimatedElements) should be within $tolerance of actual elements ($actualElements)",
kotlin.math.abs(estimatedElements - actualElements) <= tolerance
)
testEstimateFalsePositiveRate
val elementsToAdd = (1..500).map { "element_$it" }
elementsToAdd.forEach { bloomFilter.put(it) }
val estimatedFpp = bloomFilter.estimateFalsePositiveRate()
// fpp current should be close to the expected fpp
// use n = 500
val expectedFpp =
(1 - exp(-bloomFilter.getNumHashFunctions() * 500.0 / bloomFilter.getBitSetSize())).pow(
bloomFilter.getNumHashFunctions()
)
val tolerance = 0.001 // 0.1%
assertEquals(
"Estimated false positive rate ($estimatedFpp) should be approximately $expectedFpp within tolerance $tolerance",
expectedFpp,
estimatedFpp,
tolerance
)
- SBF
private fun stringToBytes(string: String): ByteArray {
return string.toByteArray()
}
@Test
fun testSerialization() {
val hashFunction = MurmurHash3
val logger = NoOpLogger
val sbf = ScalableBloomFilter.create(
initialExpectedInsertions = 100,
fpp = 0.01,
hashFunction = hashFunction,
toBytes = ::stringToBytes,
logger = logger,
)
val elementsToAdd = listOf("apple", "banana", "cherry", "date", "elderberry")
elementsToAdd.forEach { sbf.put(it) }
val serialized = sbf.serialize()
val deserialized = ScalableBloomFilter.deserialize(
data = serialized,
hashFunction = hashFunction,
toBytes = ::stringToBytes,
logger = logger,
)
// Check elements added
elementsToAdd.forEach { elements ->
assertTrue(
"Element '$elements' should be contained in the SBF",
deserialized.mightContain(elements),
)
}
// Check elements not added
val elementsNotAdded = listOf("fig", "grape", "honeydew", "kiwi", "lemon")
elementsNotAdded.forEach { elements ->
assertFalse(
"Element '$elements' should not be contained in the SBF",
deserialized.mightContain(elements),
)
}
}
-CBF
@Test
fun testBuildOptimalAndPutRemove() {
val cbf = CountingBloomFilterBuilder<String>()
.expectedInsertions(200)
.fpp(0.01)
.maxCounterValue(10)
.seed(123)
.hashFunction(MurmurHash3)
.toBytes(::toBytes)
.logger(NoOpLogger)
.buildOptimal()
cbf.put("apple")
assertTrue("apple should be in the filter", cbf.mightContain("apple"))
assertFalse("banana should not be in the filter", cbf.mightContain("banana"))
repeat(5) { cbf.put("banana") }
assertTrue(cbf.mightContain("banana"))
assertEquals("banana should have count=5", 5, cbf.count("banana"))
repeat(2) { cbf.remove("banana") }
assertEquals("banana count should be 3 after remove 2", 3, cbf.count("banana"))
}
Bloom filter MurmurHash Medium Scalable Bloom Filter Counting Bloom Filter