Home

Currently the HotRod protocol implements a few bulk operations (BulkGetRequest, BulkGetKeysRequest, QueryRequest) which return arrays of data. The problem with these operations is that they are very inefficient since they build the entire payload on the server, thus potentially requiring a lot of transient memory, and then send it to the client in a single response, which again might be to large to handle. It is also not possible to pre-filter the data on the server according to some criteria to avoid sending to the client unneeded entries.

For symmetry with the embedded distributed iterator, HotRod should implement an equivalent remote distributed iterator which would:

allow a client to specify a filter/scope (local, global, segment, custom, etc)
allow the client and the server to iterate through the retrieved entries using an iterator paradigm appropriate to the environment (java.lang.Iterator in Java, <iterator> in C++, IEnumerator in C#) in a memory efficient fashion
allow a server to efficiently batch entry in multiple responses
leverage the already existing KeyValueFilter and Converters, including deployment of custom ones into the server

API changes

A new method on org.infinispan.client.hotrod.RemoteCache:

/**
  * Retrieve entries from the server
  *
  * @param filterConverterFactory Factory name for {@link KeyValueFilterConverter} or null
  * @param segments The segments to iterate on or null if all segments should be iterated
  * @param batchSize The number of entries transferred from the server at a time
  */
CloseableIterator retrieveEntries(String filterConverterFactory, Set<Integer> segments, int batchSize);

HotRod Protocol

Operation: ITERATION_START

Request:

Field name	Type	Value
SEGMENTS	byte[ ]	segments requested
FILTER_CONVERT	String	Factory name for FilterConverter
BATCH_SIZE	vInt	batch to transfer from the server

==== Operation: ITERATION_NEXT Request: [cols="3,^2,10",options="header"]
=============================================================
Field name
Type
Value
ITERATION_ID
String
UUID of the iteration

Response:

Field name	Type	Value
ITERATION_ID	String	id of the iteration
FINISHED_SEGMENTS	byte[ ]	segments that finished iteration
ENTRIES_SIZE	vInt	number of entries transfered
KEY 1	byte	entry 1 key
VALUE 1	byte	entry 1 value
…	…	…
KEY N	byte	entry N key
VALUE N	byte	entry N value

Response: [cols="3,^2,10",options="header"]
=============================================================
Field name
Type
Value
STATUS
byte
ACK of the operation

Client side

Upon calling retrieveEntries a ITERATION_START op with be issued and the client will store the tuple [Adress, IterationId] so that all subsequent operations will go to the same server.

CloseableIterator.next() will receive a batch of entries and will keep the keys internally on a per segments basis. Whenever a segments is finished iterating (Received on field FINISHED_SEGMENTS of ITERATION_NEXT response), those keys will be discarded.

Server Side

A ITERATION_START request will create a obtain a CloseableIterator from a EntryRetriever with the optional KeyValueFilterConverter, batch size, and set of segments required. The CloseableIterator will be associated with a UUID and will be disposed upon receiving a ITERATION_CLOSE request. The CloseableIterator will also have a SegmentListener associated so that it be notified of finished segments and send back the client.

A ITERATION_CLOSE will dispose the CloseableIterator

Failover

If the server backing a CloseableIterator dies, the client will restart the iteration in other node, filtering out already finished segments.

One drawback of the per-segment iteration control is that if a particular segment was not entirely iterated before the server goes down, it will have to be iterated again, leading to duplicate values to the caller of CloseableIterator.next()

One way to mitigate this is to store in the client the keys of the unfinished segments and use it to filter out duplicates in failover scenarios.

References

JIRA: [ISPN-5219] (https://issues.jboss.org/browse/ISPN-5219)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly