Skip to content
/ bikey Public

Low memory footprint Map and Set implementation on objects with composited keys

License

Notifications You must be signed in to change notification settings

jerolba/bikey

Repository files navigation

Maven Central Build Status Download Codecov License Javadocs


Bikey

Bikey implements Map and Set data structures with two keys minimizing memory consumption.

Why Bikey collections?

Current collections libraries (Guava, Commons Collection, Eclipse Collections) have poor or not support to Maps and Sets with two keys.

Implementing it manually with a Map<R, Map<C, V>>, Map<Tuple<R, C>, V> or a Set<Tuple<R, C>> consumes a lot of memory, and choosing an incorrect hashCode function for Tuple (or equivalent) class can penalize memory and CPU consumption.

Bikey Map collection can reduce to a 15%-30% of consumed memory by a traditional double map (depending on the map fill rate) and Bikey Set collection can reduce to a 1% of consumed memory by a Set<Tuple>, with none or low penalization in access time.

Some Quick Examples

BikeyMap API is defined like the Map interface but everywhere a key is needed, you must provide both key values.

To simplify the example Strings has been used as keys, but any object that implements equals and hashCode can be used as row or column key. You can also use any kind of object as value. I've used Integer to simplify the following code:

BikeyMap<String, String, Integer> stock = new TableBikeyMap<>();
stock.put("shirt-ref-123", "store-76", 10);
stock.put("pants-ref-456", "store-12", 24);
...
stock.put("tie-ref-789", "store-23", 2);

Integer available = stock.get("shirt-ref-1234", "store-45");

//Total stock in store-123
stock.entrySet().stream()
     .filter(entry -> entry.getColumn().equals("store-123"))
     .mapToInt(entry -> entry.getValue())
     .sum();

//Total stock in pants-ref-457
stock.entrySet().stream()
     .filter(entry -> entry.getRow().equals("pants-ref-457"))
     .mapToInt(entry -> entry.getValue())
     .sum();

//All products included
Set<String> products = stock.rowKeySet();

//All stores included
Set<String> stores = stock.columnKeySet();

//Contains a product and store?
if (stock.containsKey("tie-ref-789", "store-23")) {
    ....
}

//Get all product/stores presents in the map
BikeySet<String, String> productStores = map.bikeySet();

//BikeySet<R, C> also implements Set<Bikey<R, C>>
Set<Bikey<String, String>> productStoresSet = map.bikeySet();

//Get products and stores with stock
BikeySet<String, String> withStock = stock.entrySet().stream()
    .filter(entry -> entry.getValue() > 0)
    .map(BikeyEntry::getKey)
    .collect(BikeyCollectors.toSet());

//Do something with each element in the map
stock.forEach((product, store, units) -> {
    System.out.println("Product " + product + " has " + units + " in store " + store);
});

BikeySet API is defined like the Set interface but everywhere an element is used, changes to two values:

BikeySet<String, String> avengerFilms = new TableBikeySet<>();
avengerFilms.add("Hulk", "The Avengers");
avengerFilms.add("Iron Man", "The Avengers");
avengerFilms.add("Thor", "Avengers: Age of Ultron");
avengerFilms.add("Thor", "Thor: Ragnarok");
avengerFilms.add("Captain America", "Avengers: Infinity War");
....

if (avengerFilms.contains("Iron Man", "Black Panther")) {
    ....
}

//Films in the Set
Set<String> filmsInSet = avengerFilms.columnKeySet();

//Avengers in the Set
Set<String> avengersInSet = avengerFilms.rowKeySet();

//Films with Iron Man
List<String> ironManFilms = avengerFilms.stream()
    .filter(entry -> entry.getRow().equals("Iron Man"))
    .map(Bikey::getColumn)
    .collect(toList());

//Call to a BiFunction for each element in the Set
bikeySet.forEach(this::doSomething);

public void doSomething(String avenger, String film) {
  ....
}

Implementations

BikeyMap<R, C ,V> has two implementations:

  • TableBikeyMap<R, C ,V>: optimized for memory consumption, and with performance similar to a double map or tuple map version.
  • MatrixBikeyMap<R, C V: optimizes performance, but with the disadvantage of consuming a little more memory with low fill rates.

depending on your business logic, you can use one or the other.

MatrixBikeyMap behaves like a matrix and grows quickly in memory consumption, but then it remains stable. It's recommended only if the fill rate is greater than 60% or access time to their elements is important. By default we recommend to use TableBikey implementation.

Dependency

Bikey is uploaded to Maven Central Repository and to use it, you need to add the following Maven dependency:

<dependency>
  <groupId>com.jerolba</groupId>
  <artifactId>bikey</artifactId>
  <version>0.9.0</version>
</dependency>

in Gradle:

implementation 'com.jerolba:bikey:0.9.0'

or download the single jar from Maven Central Repository.

Benchmarks

Execute your own benchmarks before deciding to use this library, but as a reference you can start with these numbers:

Memory

Compared to Map<R, Map<C, V>> and Map<Tuple<R, C>, V>, the memory consumed filling a map with 10.000 x 1.000 elements is:

Compared to HashMap<R, HashSet<C>> and HashSet<Tuple<R, C>> implementations, the memory consumed filling a Set with 10.000 x 1.000 elements is:

Performance

To create and fill randomly different maps in each implementation, the time spent is:

To find randomly different maps in each implementation, the time spent is:

To create and fill randomly a Set with 10.000 x 1.000 elements, the time spent is:

To check randomly the existence of each element in a Set with 10.000 x 1.000 elements, the time spent is:

Contribute

Feel free to dive in! Open an issue or submit PRs.

Any contributor and maintainer of this project follows the Contributor Covenant Code of Conduct.

License

Apache 2 © Jerónimo López

About

Low memory footprint Map and Set implementation on objects with composited keys

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages