Skip to content

Commit

Permalink
RFC-0075: initial draft
Browse files Browse the repository at this point in the history
  • Loading branch information
avsej committed Jun 21, 2023
1 parent 064ad6a commit 6914a45
Show file tree
Hide file tree
Showing 2 changed files with 316 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Coding happens all the time and is encouraged. We just recognize there is a poin
| 61 | [SDK3 Diagnostics](rfc/0061-sdk3-diagnostics.md) | Michael N. | ACCEPTED |
| 64 | [SDK3 Field-Level Encryption](rfc/0064-sdk3-field-level-encryption.md) | David N. | ACCEPTED |
| 69 | [KV Error Map V2](rfc/0069-kv-error-map-v2.md) | Brett L. | ACCEPTED |
| 75 | [Faster Failover and Configuration Push](rfc/0075-faster-failover-and-configuration-push.md) | Sergey | ACCEPTED |

### Draft & Review RFCs

Expand All @@ -65,7 +66,6 @@ Coding happens all the time and is encouraged. We just recognize there is a poin
| 72 | Queues And Topics [\[doc\]](https://docs.google.com/document/d/1x-wn--F1Qg6y342pBerLLfpWnAGud2HQRA0YpEVMkqU) | Michael N. | DRAFT
| 73 | KV Range Scan [\[doc\]](https://docs.google.com/document/d/1ir4E9XRvVOncReuR_QgohyompgoIvnZ0De1ik0WkrYs) | David N. & Michael N. | DRAFT
| 74 | Configuration Profiles [\[doc\]](https://docs.google.com/document/d/1LNCYgV2Eqymp3pGmA8WKPQOLSpcRyv0P7NpMYHVcUM0/) | Mike R. | DRAFT
| 75 | [Faster Failover and Configuration Push](https://github.com/couchbaselabs/sdk-rfcs/pull/123) | Sergey | DRAFT |

### Identified RFCs

Expand Down
315 changes: 315 additions & 0 deletions rfc/0075-faster-failover-and-configuration-push.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,315 @@
# Meta

| Field | Value |
|----------------|----------------------------------------|
| RFC Name | Faster Failover and Configuration Push |
| RFC ID | 75 |
| Start Date | 2023-06-14 |
| Owner | Sergey Avseyev |
| Current Status | DRAFT |
| Revision | #1 |

# Summary

TBD

# Motivation

TBD

# Relation to Other RFCs

This RFC relates to the following documents:

* [RFC-0005][rfc-0005]: VBucket Retry Logic.

* [RFC-0024][rfc-0024]: Fast-Failover SDK.


# High-Level Design

TBD

# User-Facing API

TBD

# Implementation Details

## Protocol Changes

[https://issues.couchbase.com/browse/MB-57311]: #

### Get Cluster Config with Known Version

[https://review.couchbase.org/c/kv_engine/+/192301]: #

The KV engine introduces a new HELLO flag called `GetClusterConfigWithKnownVersion` with a value of `0x1d`. This flag
does not change the behavior of the server but allows determining if the node supports epoch-revision fields for the
`GetClusterConfig` (`0xb5`) operation. If the node acknowledges `GetClusterConfigWithKnownVersion`, then the SDK can use
the new version of the command.

Epoch and revision are signed 64-bit integers encoded in network (big-endian) order.


Byte/ 0 | 1 | 2 | 3 |
/ | | | |
|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
+---------------+---------------+---------------+---------------+
0| 0x80 | 0xb5 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
4| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
8| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
12| 0xde | 0xad | 0xbe | 0xef |
+---------------+---------------+---------------+---------------+
16| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
20| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
24| 0x42 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
28| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
32| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
36| 0x08 | 0x07 | 0x06 | 0x05 |
+---------------+---------------+---------------+---------------+
40| 0x04 | 0x03 | 0x02 | 0x01 |
+---------------+---------------+---------------+---------------+
GET_CLUSTER_CONFIG command
Field (offset) (value)
Magic (0) : 0x80 (client request, SDK -> kv_engine)
Opcode (1) : 0xb5
Key length (2,3) : 0x0000
Extra length (4) : 0x00
Data type (5) : 0x00 (RAW)
Vbucket (6,7) : 0x0000
Total body (8-11) : 0x00000010 (16 bytes)
Opaque (12-15): 0xdeadbeef
CAS (16-23): 0x0000000000000000
Epoch (24-31): 0x0000000000000042 (66 in base-10)
Revision (32-39): 0x0102030405060708 (72623859790382856 in base-10)

If the node has a cluster configuration newer than what is specified in the example, the response will include the new
configuration in the body with the data type set to `JSON` (`0x01`). Otherwise, the response will have an empty body
with the data type `RAW` (`0x00`).

### Deduplicate Cluster Configuration for `NotMyVbucket` Responses

[https://review.couchbase.org/c/kv_engine/+/190899]: #

The KV engine introduces a new HELLO flag called `DedupeNotMyVbucketClustermap` with a value of `0x1e`. Once this flag
is negotiated, the node might send an empty body with `NotMyVbucket` (`0x07`) status codes. The KV engine tracks the
revision that has been sent to the SDK over the socket connection, so a response with a `NotMyVbucket` status will only
have a body if the pushed version is older than the active configuration.

The KV engine updates the pushed configuration version in the following cases:
* Configuration sent to the SDK in response to a `GetClusterConfig` (`0xb5`) request.
* Configuration pushed to the SDK that enabled the HELLO flag `ClustermapChangeNotification` (`0x0d`).

Note, that `DedupeNotMyVbucketClustermap` affects `ClustermapChangeNotification` and `ClustermapChangeNotificationBrief`
features, that described below. In other words, if deduplication enabled, the cluster configuration will be announce for
the socket connection only once.

### Enforcing Snappy Compression for Cluster Configuration Payloads

[https://review.couchbase.org/c/kv_engine/+/192152]: #
[https://review.couchbase.org/c/kv_engine/+/192316]: #

The KV engine introduces a new HELLO flag called `SnappyEverywhere` with a value of `0x13`. Once this flag is
negotiated, the node will always use the compressed version of the cluster configuration and data type flags will be set
to `JSON | SNAPPY` (`0x03`).

### `GetClusterConfig` and Out-of-Order Execution

[https://issues.couchbase.com/browse/MB-56885]: #

HELLO flag `UnorderedExecution` (`0x0e`) enables Out-of-Order (OoO) execution, so that the KV engine is being allowed to
reorder operations. [kv\_engine/docs/UnorderedExecution.md][kv-unordered-execution] provides more details on this
feature.

The `GetClusterConfig` (`0xb5`) command is explicitly marked as compatible with OoO execution, allowing it to be served
without waiting for the completion of in-flight operations. Specifically, `GetClusterConfig` will not wait for long
operations such as mutations with SyncDurability requirements. All current SDKs are expected to be compatible with the
OoO execution mode, so no changes are expected.

### Cluster Configuration Notification Changes

Prior to server version 7.6, the KV engine had an opt-in feature to push configuration updates to SDKs. This feature
could be enabled using the HELLO flag `ClustermapChangeNotification` (`0x0d`), which depends on `Duplex` (`0x0c`). More
details about `Duplex` can be found in [kv\_engine/docs/Duplex.md][kv-duplex]. When both flags are negotiated, the
server will send unsolicited configuration updates to the SDK without expecting any acknowledgement mechanism. While
this approach proves to have better responsiveness compared to [RFC-0024: Fast Failover][rfc-0024], it also has its own
drawbacks, such as:

1. The SDK subscribes all connections using HELLO, and during rebalance, all connections will receive all notifications.
2. In a Lambda scenario, if failover occurs while the SDK process is paused, upon resuming, the SDK must process all
updates on all sockets. This process takes unnecessary time, unlike when the SDK polls every 2.5 seconds.

Since version 7.6, the KV engine introduces the HELLO flag `ClustermapChangeNotificationBrief` (`0x1f`). This flag
instructs the KV engine to exclude the cluster configuration content from the notification. In this case, the data type
will be `RAW` (`0x00`). Below is the typical structure of the notification when the brief mode is enabled:


Byte/ 0 | 1 | 2 | 3 |
/ | | | |
|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
+---------------+---------------+---------------+---------------+
0| 0x82 | 0x01 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
4| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
8| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
12| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
16| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
20| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
24| 0x42 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
28| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
32| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
36| 0x08 | 0x07 | 0x06 | 0x05 |
+---------------+---------------+---------------+---------------+
40| 0x04 | 0x03 | 0x02 | 0x01 |
+---------------+---------------+---------------+---------------+
CLUSTERMAP_CHANGE_NOTIFICATION command
Field (offset) (value)
Magic (0) : 0x82 (server request, kv_engine -> SDK)
Opcode (1) : 0x01
Key length (2,3) : 0x0000
Extra length (4) : 0x10 (two int64_t fields in extras)
Data type (5) : 0x00 (RAW)
Vbucket (6,7) : 0x0000
Total body (8-11) : 0x00000010 (16 bytes)
Opaque (12-15): 0x00000000
CAS (16-23): 0x0000000000000000
Epoch (24-31): 0x0000000000000042 (66 in base-10)
Revision (32-39): 0x0102030405060708 (72623859790382856 in base-10)

So note that magic is `ServerRequest` (`0x82`), that is enabled by `Duplex` (`0x0c`) HELLO flag. Also note that just
like in regular cluster configuration notification, epoch and revision fields are sent as extras.

Note that the magic value for this notification is `ServerRequest` (`0x82`), which is enabled by the `Duplex` (`0x0c`)
HELLO flag. Additionally, similar to the regular cluster configuration notification, the epoch and revision fields are
sent as extras.

Once the brief cluster configuration notification is received, it is up to the SDK to decide whether to send a
`GetClusterConfig` (`0xb5`) request to retrieve the actual configuration body.

In essence, the `ClustermapChangeNotificationBrief` feature only saves network traffic. If
`DedupeNotMyVbucketClustermap` is not enabled, the number of notifications will be the same as before. However, this
feature can still be used as a building block to implement a debouncing mechanism. When properly configured, it can help
reduce the number of requests. Further details on this topic will be covered in the "Library Changes" section.

## Library Changes

### Configuration Push

The previously mentioned `ClustermapChangeNotificationBrief` feature enables the SDK to subscribe all connections for
configuration updates. These notifications are lightweight and can be deduplicated by the server when the
`DedupeNotMyVbucketClustermap` option is negotiated.

#### Mixed Clusters

In clusters where there is a mix of nodes with older server versions, meaning that some nodes do not acknowledge
`ClustermapChangeNotificationBrief`, the respective connection should notify the configuration monitor about its lack of
support for configuration pushes from the server. As a result, the monitor should utilize the old polling mechanism for
this particular node instead.

### Enhancements in Handling the `NotMyVbucket` Status

Combination of `DedupeNotMyVbucketClustermap` and `ClustermapChangeNotificationBrief` allows to save traffic by not
sending configuration, if SDK already seen the same revision, and also sends only pair of `Epoch`/`Revision`. So it is
up to SDK to initiate configuration update once the non-empty payload returned along with `NotMyVbucket` status code.

Several modifications are required in the SDK:
1. The retry orchestrator should be able to retry an operation based on configuration updates rather than the timer signal.
2. The configuration monitor should have the ability to throttle configuration requests due to the following reasons:
1. During rebalance, multiple operations may return a `NotMyVbucket` status, triggering a configuration refresh.
2. Since `ClustermapChangeNotificationBrief` will cause all connections to subscribe to updates and receive them, it is
necessary to account for potential high volumes of updates.

Below is a diagram that illustrates an example of the SDK workflow, where the GET request is waiting for the arrival of
a new configuration.

```mermaid
sequenceDiagram
autonumber
conn_1->>+kv_node_1: get("foo", vb=115)
kv_node_1->>-conn_1: NotMyVbucket(epoch=1, rev=11)
conn_1-->>+retry_orchestrator: pending(get, "foo", epoch=1, rev=11)
retry_orchestrator-->retry_orchestrator: put operation to wating queue
conn_1-->>+config_monitor: refresh configuration
config_monitor-->config_monitor: wait to throttle config requests
config_monitor->>+conn_2: get_config()
conn_2->>+kv_node_2: get_config()
kv_node_2->>-conn_2: configuration(epoch=1, rev=11)
conn_2->>-config_monitor: apply new configuration
config_monitor->>retry_orchestrator: purge waiting queue(epoch=1, rev=11)
retry_orchestrator->>conn_1: retry get("foo")
conn_1->>+kv_node_2: get("foo", vb=115)
kv_node_2->>-conn_1: Success()
```

# Language Specifics

## Feature Checklist

1. `GetClusterConfigWithKnownVersion` (`0x1d`). The SDK should always supply current configuration version if the
connection has acknowledged feature flag.

2. `DedupeNotMyVbucketClustermap` (`0x1e`). The SDK should be ready that the KV engine will not repeat configuration
payload if it already been sent to the socket by any means (`NotMyVbucket` status, `ClustermapChangeNotification`,
`GetClusterConfig`).

3. Out-of-Order Execution. `Duplex` (`0x0c`) feature should be always negotiated in HELLO.

4. `ClustermapChangeNotificationBrief` (`0x1f`). The SDK should always subscribe for configuration notifications, if the
server supports it, and fallback to polling if it does not.

5. SDK should not emit configuration refresh request if there is one already in-flight. This should be independent of
the source of the signal, as it might come from all the nodes during rebalance when the configuration push is
enabled, or from `NotMyVbucket` responses.

6. [OPTIONAL] `SnappyEverywhere` (`0x13`). The SDK should be ready that KV engine might send Snappy-compressed payload with any
of the response types (including push notifications). Check datatype `SNAPPY` (`0x02`).

# Open Questions

1. Behaviour in mixed clusters. Upgrade, when new nodes can push config, while old nodes cannot. Downgrade, when new
nodes cannot push configuration (should we even consider downgrade?).

2. TBD

3. TBD

# Revisions

* Revision #1 (2023-XX-YY; Sergey Avseyev)
* Completed initial draft.

# Signoff

| Language | Team Member | Signoff Date | Revision |
|-------------|----------------|--------------|----------|
| .NET | Jeffry Morris | | |
| C/C++ | Sergey Avseyev | | |
| Go | Charles Dixon | | |
| Java/Kotlin | David Nault | | |
| Node.js | Jared Casey | | |
| PHP | Sergey Avseyev | | |
| Python | Jared Casey | | |
| Ruby | Sergey Avseyev | | |
| Scala | Graham Pople | | |

[kv-unordered-execution]: https://github.com/couchbase/kv_engine/blob/master/docs/UnorderedExecution.md
[kv-duplex]: https://github.com/couchbase/kv_engine/blob/master/docs/Duplex.md
[rfc-0005]: rfc/0005-vbucket-retries.md
[rfc-0024]: rfc/0024-fast-failover.md

0 comments on commit 6914a45

Please sign in to comment.