Design: Cross entity invocation service

Design: Cross-entity invocation service

Problem

We need a way for two entities to communicate. An example of this would be some kind of administration entity asking another entity to drop some kind of resource.
We need a way for an entity to enqueue a delayed invocation on itself. This is useful for periodic timer or asynchronous flush operations.

Why can’t we solve these problems directly?

1. Inter-entity communication

The obvious way to communicate between entities would be for Entity A to register some kind of callback in a service which could later be called by Entity B.

There are a few reasons why this doesn’t work:

No replication. The message needs to be replicated to any passive servers and this wouldn’t be visible at that level.
No control over concurrency key. An entity defines how it can concurrently execute invocations via its ConcurrencyStrategy. A call coming in from another entity would originate on a thread which is not defined by this strategy.
No defined global order. All messages have a global order which is the same on all servers within a stripe. This is important to correctly handle re-sent transactions and to ensure that a passive receives state change events in the same order observed by the active. A call into an entity from another entity does not fit into this global order.

Demonstration of how these problems would lead to inconsistent state between an active and a passive server

EntityA is a simple map-based in-memory data store. It has operations such as put/get/clear.
EntityB is an administrative entity, with knowledge of EntityA, and can be asked to tell EntityA to drop its in-memory state and go back to an empty state.
Message1 is sent to EntityA, as a put, in order to store a piece of data into the map.
Message2 is sent to EntityB, asking it to clear the in-memory state of EntityA.
The global order is Message1 and then Message2.

Different entities run in different concurrency key spaces so, even though both entities want to run single-threaded, the server schedules these invocations to run concurrently (relative to each other).

queues 1

This now creates 3 different possible states:

Message1 completes before Message2: EntityA now has 0 elements in its map.
Message2 completes before Message1: EntityA now has 1 element in its map.
Message1 and Message2 both attempt to access the underlying map at the same time, thus corrupting it or causing unexpected exceptions to be thrown.

In the case of only one server in a stripe, this non-deterministic outcome, despite a consistent ordering and single-threaded entities, is unexpected and not acceptable. The problem goes even further if there is a passive server receiving replicated messages. Not only would the passive entities need to invoke the same operation, but the order of events may be different between the active and passive, thus meaning that the data model of the stripe is inconsistent.

2. Delayed invocation

The obvious way to delay an operation within one entity would be to create a thread to run it, waiting for a specific time.

This creates a problem similar to the one above in that the delayed invocation is not replicated (meaning that, even if the passive does the same thing, the relative orders could be different) and it also creates another problem in that the entity is creating its own thread resources, not directly visible to the platform. This could cause problems when the entity is destroyed or when a passive entity is discarded to be promoted to active. This old thread may still try to access the now-dead entity instance.

If a service were provided by the platform to do this, it would address some of these issues but would now mean that the entity didn’t have control over how the thread ran, meaning it would be vulnerable to the same unexpected multi-threading issue, from above.

Proposal: Inter-entity messenger service

The core problem here is that these calls circumvent the normal message flow through the platform: invocations being fed from a client. We can’t use the normal flow for this as there is no client but we can do something similar, with a few caveats. Specifically, the messages won’t be re-sent if there is a fail-over or a restart during the call. This means that messages sent this way cannot be treated as reliable but will otherwise act as any other message.

This design proposes a service which will offer this capability.

How to identify the target entity of a message

The service will provide a way for an entity to ask for its own TargetIdentifier. Note that it can only ask for the identifier of itself, thus meaning that there is no need for a special name server kind of provision within the service. Any other service which wishes to provide some way for entities to communicate between each other will need to act as a registry of these TargetIdentifers, resolved via its own high-level mechanisms.

The service will allow a message to be sent to a TargetIdentifier.

How to create the message to invoke

Since there is no common message dialect, each entity defining its own, any entity wishing to receive messages for a specific purpose will need to provide some kind of callback mechanism to construct a message instance. While this does mean that the source entity will be executing code provided by the target, this callback can be easily constrained to have no side-effects and depend on no global scope.

It means that the service which is connecting these entities, logically, will not only need to hold the TargetIdentifier, but a kind of MessageBuilder which can be used to construct the message.

The inter-entity messenger service can therefore accept a message which can be sent to the TargetIdentifier in the dialect it wants to use. This means that concerns such as how to interpret the message and how it should be scheduled in a concurrency key can be controlled by the target.

Bringing it together

Now that there is a TargetIdentifier and a MessageBuilder, registered in some third-party service, an entity wishing to send a message can use the MessageBuilder to construct the message and then ask the inter-entity messenger service to send the message to the TargetIdentifier.

Internally, this means that it will be added to the normal message queues, honoring consistent global order and concurrency key, while additionally being replicated to any passives in a consistent order.

Illustrated example

We start out with 2 entities: EntityA needs to send a message to EntityB and it will look up this entity in the Admin Service (just an example service). EntityB starts this registration by asking the Messenger Service for its own TargetIdentifier:
flow 1

EntityB then stores this TargetIdentifier as well as a MessageBuilder (which can construct the message it expects to receive from EntityA) in the Admin Service:
flow 2

EntityA now looks up what EntityB has stored and uses its MessageBuilder to construct an opaque message:
flow 3

EntityA then looks up EntityB's TargetIdentifier (having also been stored in the Admin Service) and tells the Messenger Service to deliver the Message it previously created to the entity identified by this TargetIdentifier:
flow 4

Messenger Service now adds the message to the message queue for EntityB and it is invoked as any other message:
flow 5

Additional caveats

The sender cannot receive a response from this message, nor can it block on the acknowledgement that it was received or completed. This is because we do not want to provide a mechanism for an entity to block execution within the invoke path.

Potential extensions to handle common patterns

If the need for a response is a common pattern within expected uses, it may be possible for the sender to also provide a response message to be scheduled on it, once the initial message completes on all servers in the stripe.

The problem with doing this is that it is only safely useful as a completed acknowledgement, as opposed to an actual return path for data. While it would be possible (and easily implemented) for the EntityResponse from the message to be returned to the sender in this response message, it isn’t clear how this response would be interpreted as it is also defined within the dialect of the target which, in general, is opaque to the sender.

When it completed and potentially whether or not it threw an exception may be the only kind of data which could be found, this way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly