KBloom

KBloom is a simple Bloom Filter library for Android developed with Kotlin. It helps you to efficiently check if an element might be in a set without storing all elements. Perfect for memory-constrained mobile devices!

Main Features

Core Bloom Filter:
- Basic insert(put, putAll) and membership check(mightCotain, mightContainAll)
- Uses hashFuction: HashFuction and a lambda toBytes: (T) -> ByteArray to manage any data type T
- Provides a private constructor plus methods create(...) and restore(...)
Modular Architecture:
- hahsing module: Different hash functions(MurmurHash3, XxHash32)
- loggin module: Pluggable loggers (ConsoleLogger, NoOpLogger)
- serialization module: Serialize/deserialize to formats (ByteArraySerializer, JsonSerializer, MessagePackSerializer)
- configuation module: Includes a builder pattern for flexiable Bloom Filter setup
Builder Pattern:
- BloomFilterBuilder allows setting expectedInsertions, fpp, seed, custom hashFunctio, or toBytes
- Support stategies: OPTIMAL (automatic k calculation) or CUSTOM (manual expectedInsertions)
Error Handling:
- Custom exceptions (InvalidConfigurationException, InvalidConfigurationException) for invalid settings or bad serialized data
Performance Enhancements:
- Option to use a LongArrayBitArray instead of IntArray for higher efficiency
- Potential concurrency measures if needed
Additional Features:
- Estimate Current Number of Elements : Uses bit-count to approximate how many items the Bloom Filter already contains
- Probability Estimation : Calculates a current false positive rate based on how many bit are set

Scalable Bloom Filter (SBF)

Key Features:

Automatic Expansion: When the current filter reaches its limit (based on a load factor or heuristic), a new child Bloom Filter is created to continue accommodating the elements.
False Positive Control: Each child Bloom Filter can be individually configured (bitSetSize, number of hash functions, fpp) to maintain a consistent false positive rate.
Short-Circuit Evaluation: When checking for the existence of an element, the filters are traversed from the newest to the oldest, and the process stops as soon as a filter indicates that the element is not present.
Self-Adjustment: Utilizes a GrowthStrategy (e.g., Default, Geometric, Tightening) to adjust the size and configuration of the new filter, aligning with the actual insertion load.
Serialization/Deserialization: Supports storing and restoring the complete state of the SBF, including the parameters of each child Bloom Filter and the information(name) on the GrowthStrategy.

Counting Bloom Filter (CBF)

Counters Array Instead of Bit Array: Each position stores a counter value instead of just a 0/1.
Support for Removing Elements: It allows decreasing the counter values to "remove" an element, which is not typically permitted in a standard Bloom Filter.
Counting Occurrences: Provides a count(element) function to estimate the number of times an element has been added, based on the minimum value among the counters at the hashed positions.
Memory Management & Overflows: It is necessary to define a maximum limit for each counter (maxCounterValue) to prevent overflow, which can affect the counting accuracy.

Future Features

Kotlin Multiplatform:
- Use KBloom on iOS and JVM platforms.

Installation

Using JitPack

Add JitPack repository to your project-level build.gradle:

allprojects {
    repositories {
        google()
        mavenCentral()
        maven { url 'https://jitpack.io' }
    }
}

Add KBloom dependency to your app-level build.gradle:

dependencies {
   implementation 'com.github.moclam1905:KBloom:1.3'
}

Usage

Creating a Bloom Filter

val mockLogger = MockLogger() // for testing
val bf = BloomFilterBuilder<String>()
    .expectedInsertions(1000)
    .falsePositiveProbability(0.01)
    .seed(42)
    .hashFunction(MurmurHash3)
    .toBytes { it.toByteArray(Charsets.UTF_8) }
    .logger(mockLogger)
    .build()

Note: See more about exaple toBytes in BloomFilterGenericTypeTest.kt

Adding Elements

// Add single elements
bloomFilter.put("apple")
bloomFilter.put("banana")

// Add multiple elements at once
val fruits = listOf("cherry", "date", "elderberry")
bloomFilter.putAll(fruits)

Checking Elements

// Check single elements
val containsApple = bloomFilter.mightContain("apple") // true
val containsGrape = bloomFilter.mightContain("grape") // false (or true with low probability)

// Check multiple elements at once
val checkFruits = listOf("apple", "banana", "cherry")
val allContain = bloomFilter.mightContainAll(checkFruits) // true if all are present

Serialize and Deserialize

val serializedJson = bf.serialize(SerializationFormat.JSON)
val bfDeserialized = BloomFilter.deserialize<String>(
    byteArray = serializedJson,
    format = SerializationFormat.JSON,
    hashFunction = MurmurHash3,
    toBytes = { it.toByteArray(Charsets.UTF_8) }
)

SerializationFormat

enum class SerializationFormat {
    BYTE_ARRAY,
    JSON,
    MESSAGEPACK
    // maybe support more serialization format in future
}

Additional Features

-testEstimateCurrentNumberOfElements

val elementsToAdd = listOf("apple", "banana", "cherry", "date", "elderberry")
elementsToAdd.forEach { bloomFilter.put(it) }

val estimatedElements = bloomFilter.estimateCurrentNumberOfElements()

// n ≈ -(m/k) * ln(1 - x/m)
val actualElements = elementsToAdd.size
// allowed tolerance is 2 elements
val tolerance = 2.0
// Check the difference between estimated and actual elements
assertTrue(
    "Estimated elements ($estimatedElements) should be within $tolerance of actual elements ($actualElements)",
    kotlin.math.abs(estimatedElements - actualElements) <= tolerance
)

testEstimateFalsePositiveRate

val elementsToAdd = (1..500).map { "element_$it" }
elementsToAdd.forEach { bloomFilter.put(it) }

val estimatedFpp = bloomFilter.estimateFalsePositiveRate()
// fpp current should be close to the expected fpp
// use n = 500
val expectedFpp =
    (1 - exp(-bloomFilter.getNumHashFunctions() * 500.0 / bloomFilter.getBitSetSize())).pow(
        bloomFilter.getNumHashFunctions()
    )
val tolerance = 0.001 // 0.1%

assertEquals(
    "Estimated false positive rate ($estimatedFpp) should be approximately $expectedFpp within tolerance $tolerance",
    expectedFpp,
    estimatedFpp,
    tolerance
)

SBF

private fun stringToBytes(string: String): ByteArray {
        return string.toByteArray()
    }
@Test
fun testSerialization() {
    val hashFunction = MurmurHash3
    val logger = NoOpLogger
    val sbf = ScalableBloomFilter.create(
        initialExpectedInsertions = 100,
        fpp = 0.01,
        hashFunction = hashFunction,
        toBytes = ::stringToBytes,
        logger = logger,
    )

    val elementsToAdd = listOf("apple", "banana", "cherry", "date", "elderberry")
    elementsToAdd.forEach { sbf.put(it) }

    val serialized = sbf.serialize()
    val deserialized = ScalableBloomFilter.deserialize(
        data = serialized,
        hashFunction = hashFunction,
        toBytes = ::stringToBytes,
        logger = logger,
    )

    // Check elements added
    elementsToAdd.forEach { elements ->
        assertTrue(
            "Element '$elements' should be contained in the SBF",
            deserialized.mightContain(elements),
        )
    }

    // Check elements not added
    val elementsNotAdded = listOf("fig", "grape", "honeydew", "kiwi", "lemon")
    elementsNotAdded.forEach { elements ->
        assertFalse(
            "Element '$elements' should not be contained in the SBF",
            deserialized.mightContain(elements),
        )
    }
}

-CBF

@Test
fun testBuildOptimalAndPutRemove() {
    val cbf = CountingBloomFilterBuilder<String>()
        .expectedInsertions(200)
        .fpp(0.01)
        .maxCounterValue(10)
        .seed(123)
        .hashFunction(MurmurHash3)
        .toBytes(::toBytes)
        .logger(NoOpLogger)
        .buildOptimal()

    cbf.put("apple")
    assertTrue("apple should be in the filter", cbf.mightContain("apple"))
    assertFalse("banana should not be in the filter", cbf.mightContain("banana"))

    repeat(5) { cbf.put("banana") }
    assertTrue(cbf.mightContain("banana"))
    assertEquals("banana should have count=5", 5, cbf.count("banana"))

    repeat(2) { cbf.remove("banana") }
    assertEquals("banana count should be 3 after remove 2", 3, cbf.count("banana"))
}

Documentation

Bloom filter MurmurHash Medium Scalable Bloom Filter Counting Bloom Filter

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
.idea		.idea
app		app
gradle		gradle
kbloom		kbloom
.gitignore		.gitignore
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
jitpack.yml		jitpack.yml
kbloom_logo.jpg		kbloom_logo.jpg
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KBloom

Main Features

Future Features

Installation

Using JitPack

Usage

Creating a Bloom Filter

Adding Elements

Checking Elements

Serialize and Deserialize

SerializationFormat

Additional Features

Documentation

License

About

Releases 4

Packages

Languages

moclam1905/KBloom

Folders and files

Latest commit

History

Repository files navigation

KBloom

Main Features

Future Features

Installation

Using JitPack

Usage

Creating a Bloom Filter

Adding Elements

Checking Elements

Serialize and Deserialize

SerializationFormat

Additional Features

Documentation

License

About

Topics

Resources

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages