From 6914a45670b801cfa9899dd4e887678db15257ab Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Wed, 14 Jun 2023 14:58:17 +0300 Subject: [PATCH 01/15] RFC-0075: initial draft --- README.md | 2 +- ...-faster-failover-and-configuration-push.md | 315 ++++++++++++++++++ 2 files changed, 316 insertions(+), 1 deletion(-) create mode 100644 rfc/0075-faster-failover-and-configuration-push.md diff --git a/README.md b/README.md index 35bfce0..6bab41f 100644 --- a/README.md +++ b/README.md @@ -43,6 +43,7 @@ Coding happens all the time and is encouraged. We just recognize there is a poin | 61 | [SDK3 Diagnostics](rfc/0061-sdk3-diagnostics.md) | Michael N. | ACCEPTED | | 64 | [SDK3 Field-Level Encryption](rfc/0064-sdk3-field-level-encryption.md) | David N. | ACCEPTED | | 69 | [KV Error Map V2](rfc/0069-kv-error-map-v2.md) | Brett L. | ACCEPTED | +| 75 | [Faster Failover and Configuration Push](rfc/0075-faster-failover-and-configuration-push.md) | Sergey | ACCEPTED | ### Draft & Review RFCs @@ -65,7 +66,6 @@ Coding happens all the time and is encouraged. We just recognize there is a poin | 72 | Queues And Topics [\[doc\]](https://docs.google.com/document/d/1x-wn--F1Qg6y342pBerLLfpWnAGud2HQRA0YpEVMkqU) | Michael N. | DRAFT | 73 | KV Range Scan [\[doc\]](https://docs.google.com/document/d/1ir4E9XRvVOncReuR_QgohyompgoIvnZ0De1ik0WkrYs) | David N. & Michael N. | DRAFT | 74 | Configuration Profiles [\[doc\]](https://docs.google.com/document/d/1LNCYgV2Eqymp3pGmA8WKPQOLSpcRyv0P7NpMYHVcUM0/) | Mike R. | DRAFT -| 75 | [Faster Failover and Configuration Push](https://github.com/couchbaselabs/sdk-rfcs/pull/123) | Sergey | DRAFT | ### Identified RFCs diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md new file mode 100644 index 0000000..9b78893 --- /dev/null +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -0,0 +1,315 @@ +# Meta + +| Field | Value | +|----------------|----------------------------------------| +| RFC Name | Faster Failover and Configuration Push | +| RFC ID | 75 | +| Start Date | 2023-06-14 | +| Owner | Sergey Avseyev | +| Current Status | DRAFT | +| Revision | #1 | + +# Summary + +TBD + +# Motivation + +TBD + +# Relation to Other RFCs + +This RFC relates to the following documents: + +* [RFC-0005][rfc-0005]: VBucket Retry Logic. + +* [RFC-0024][rfc-0024]: Fast-Failover SDK. + + +# High-Level Design + +TBD + +# User-Facing API + +TBD + +# Implementation Details + +## Protocol Changes + +[https://issues.couchbase.com/browse/MB-57311]: # + +### Get Cluster Config with Known Version + +[https://review.couchbase.org/c/kv_engine/+/192301]: # + +The KV engine introduces a new HELLO flag called `GetClusterConfigWithKnownVersion` with a value of `0x1d`. This flag +does not change the behavior of the server but allows determining if the node supports epoch-revision fields for the +`GetClusterConfig` (`0xb5`) operation. If the node acknowledges `GetClusterConfigWithKnownVersion`, then the SDK can use +the new version of the command. + +Epoch and revision are signed 64-bit integers encoded in network (big-endian) order. + + + Byte/ 0 | 1 | 2 | 3 | + / | | | | + |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| + +---------------+---------------+---------------+---------------+ + 0| 0x80 | 0xb5 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 4| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 8| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 12| 0xde | 0xad | 0xbe | 0xef | + +---------------+---------------+---------------+---------------+ + 16| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 20| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 24| 0x42 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 28| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 32| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 36| 0x08 | 0x07 | 0x06 | 0x05 | + +---------------+---------------+---------------+---------------+ + 40| 0x04 | 0x03 | 0x02 | 0x01 | + +---------------+---------------+---------------+---------------+ + GET_CLUSTER_CONFIG command + Field (offset) (value) + Magic (0) : 0x80 (client request, SDK -> kv_engine) + Opcode (1) : 0xb5 + Key length (2,3) : 0x0000 + Extra length (4) : 0x00 + Data type (5) : 0x00 (RAW) + Vbucket (6,7) : 0x0000 + Total body (8-11) : 0x00000010 (16 bytes) + Opaque (12-15): 0xdeadbeef + CAS (16-23): 0x0000000000000000 + Epoch (24-31): 0x0000000000000042 (66 in base-10) + Revision (32-39): 0x0102030405060708 (72623859790382856 in base-10) + +If the node has a cluster configuration newer than what is specified in the example, the response will include the new +configuration in the body with the data type set to `JSON` (`0x01`). Otherwise, the response will have an empty body +with the data type `RAW` (`0x00`). + +### Deduplicate Cluster Configuration for `NotMyVbucket` Responses + +[https://review.couchbase.org/c/kv_engine/+/190899]: # + +The KV engine introduces a new HELLO flag called `DedupeNotMyVbucketClustermap` with a value of `0x1e`. Once this flag +is negotiated, the node might send an empty body with `NotMyVbucket` (`0x07`) status codes. The KV engine tracks the +revision that has been sent to the SDK over the socket connection, so a response with a `NotMyVbucket` status will only +have a body if the pushed version is older than the active configuration. + +The KV engine updates the pushed configuration version in the following cases: +* Configuration sent to the SDK in response to a `GetClusterConfig` (`0xb5`) request. +* Configuration pushed to the SDK that enabled the HELLO flag `ClustermapChangeNotification` (`0x0d`). + +Note, that `DedupeNotMyVbucketClustermap` affects `ClustermapChangeNotification` and `ClustermapChangeNotificationBrief` +features, that described below. In other words, if deduplication enabled, the cluster configuration will be announce for +the socket connection only once. + +### Enforcing Snappy Compression for Cluster Configuration Payloads + +[https://review.couchbase.org/c/kv_engine/+/192152]: # +[https://review.couchbase.org/c/kv_engine/+/192316]: # + +The KV engine introduces a new HELLO flag called `SnappyEverywhere` with a value of `0x13`. Once this flag is +negotiated, the node will always use the compressed version of the cluster configuration and data type flags will be set +to `JSON | SNAPPY` (`0x03`). + +### `GetClusterConfig` and Out-of-Order Execution + +[https://issues.couchbase.com/browse/MB-56885]: # + +HELLO flag `UnorderedExecution` (`0x0e`) enables Out-of-Order (OoO) execution, so that the KV engine is being allowed to +reorder operations. [kv\_engine/docs/UnorderedExecution.md][kv-unordered-execution] provides more details on this +feature. + +The `GetClusterConfig` (`0xb5`) command is explicitly marked as compatible with OoO execution, allowing it to be served +without waiting for the completion of in-flight operations. Specifically, `GetClusterConfig` will not wait for long +operations such as mutations with SyncDurability requirements. All current SDKs are expected to be compatible with the +OoO execution mode, so no changes are expected. + +### Cluster Configuration Notification Changes + +Prior to server version 7.6, the KV engine had an opt-in feature to push configuration updates to SDKs. This feature +could be enabled using the HELLO flag `ClustermapChangeNotification` (`0x0d`), which depends on `Duplex` (`0x0c`). More +details about `Duplex` can be found in [kv\_engine/docs/Duplex.md][kv-duplex]. When both flags are negotiated, the +server will send unsolicited configuration updates to the SDK without expecting any acknowledgement mechanism. While +this approach proves to have better responsiveness compared to [RFC-0024: Fast Failover][rfc-0024], it also has its own +drawbacks, such as: + +1. The SDK subscribes all connections using HELLO, and during rebalance, all connections will receive all notifications. +2. In a Lambda scenario, if failover occurs while the SDK process is paused, upon resuming, the SDK must process all + updates on all sockets. This process takes unnecessary time, unlike when the SDK polls every 2.5 seconds. + +Since version 7.6, the KV engine introduces the HELLO flag `ClustermapChangeNotificationBrief` (`0x1f`). This flag +instructs the KV engine to exclude the cluster configuration content from the notification. In this case, the data type +will be `RAW` (`0x00`). Below is the typical structure of the notification when the brief mode is enabled: + + + Byte/ 0 | 1 | 2 | 3 | + / | | | | + |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| + +---------------+---------------+---------------+---------------+ + 0| 0x82 | 0x01 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 4| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 8| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 12| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 16| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 20| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 24| 0x42 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 28| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 32| 0x00 | 0x00 | 0x00 | 0x00 | + +---------------+---------------+---------------+---------------+ + 36| 0x08 | 0x07 | 0x06 | 0x05 | + +---------------+---------------+---------------+---------------+ + 40| 0x04 | 0x03 | 0x02 | 0x01 | + +---------------+---------------+---------------+---------------+ + CLUSTERMAP_CHANGE_NOTIFICATION command + Field (offset) (value) + Magic (0) : 0x82 (server request, kv_engine -> SDK) + Opcode (1) : 0x01 + Key length (2,3) : 0x0000 + Extra length (4) : 0x10 (two int64_t fields in extras) + Data type (5) : 0x00 (RAW) + Vbucket (6,7) : 0x0000 + Total body (8-11) : 0x00000010 (16 bytes) + Opaque (12-15): 0x00000000 + CAS (16-23): 0x0000000000000000 + Epoch (24-31): 0x0000000000000042 (66 in base-10) + Revision (32-39): 0x0102030405060708 (72623859790382856 in base-10) + +So note that magic is `ServerRequest` (`0x82`), that is enabled by `Duplex` (`0x0c`) HELLO flag. Also note that just +like in regular cluster configuration notification, epoch and revision fields are sent as extras. + +Note that the magic value for this notification is `ServerRequest` (`0x82`), which is enabled by the `Duplex` (`0x0c`) +HELLO flag. Additionally, similar to the regular cluster configuration notification, the epoch and revision fields are +sent as extras. + +Once the brief cluster configuration notification is received, it is up to the SDK to decide whether to send a +`GetClusterConfig` (`0xb5`) request to retrieve the actual configuration body. + +In essence, the `ClustermapChangeNotificationBrief` feature only saves network traffic. If +`DedupeNotMyVbucketClustermap` is not enabled, the number of notifications will be the same as before. However, this +feature can still be used as a building block to implement a debouncing mechanism. When properly configured, it can help +reduce the number of requests. Further details on this topic will be covered in the "Library Changes" section. + +## Library Changes + +### Configuration Push + +The previously mentioned `ClustermapChangeNotificationBrief` feature enables the SDK to subscribe all connections for +configuration updates. These notifications are lightweight and can be deduplicated by the server when the +`DedupeNotMyVbucketClustermap` option is negotiated. + +#### Mixed Clusters + +In clusters where there is a mix of nodes with older server versions, meaning that some nodes do not acknowledge +`ClustermapChangeNotificationBrief`, the respective connection should notify the configuration monitor about its lack of +support for configuration pushes from the server. As a result, the monitor should utilize the old polling mechanism for +this particular node instead. + +### Enhancements in Handling the `NotMyVbucket` Status + +Combination of `DedupeNotMyVbucketClustermap` and `ClustermapChangeNotificationBrief` allows to save traffic by not +sending configuration, if SDK already seen the same revision, and also sends only pair of `Epoch`/`Revision`. So it is +up to SDK to initiate configuration update once the non-empty payload returned along with `NotMyVbucket` status code. + +Several modifications are required in the SDK: +1. The retry orchestrator should be able to retry an operation based on configuration updates rather than the timer signal. +2. The configuration monitor should have the ability to throttle configuration requests due to the following reasons: + 1. During rebalance, multiple operations may return a `NotMyVbucket` status, triggering a configuration refresh. + 2. Since `ClustermapChangeNotificationBrief` will cause all connections to subscribe to updates and receive them, it is + necessary to account for potential high volumes of updates. + +Below is a diagram that illustrates an example of the SDK workflow, where the GET request is waiting for the arrival of +a new configuration. + +```mermaid +sequenceDiagram + autonumber + conn_1->>+kv_node_1: get("foo", vb=115) + kv_node_1->>-conn_1: NotMyVbucket(epoch=1, rev=11) + conn_1-->>+retry_orchestrator: pending(get, "foo", epoch=1, rev=11) + retry_orchestrator-->retry_orchestrator: put operation to wating queue + conn_1-->>+config_monitor: refresh configuration + config_monitor-->config_monitor: wait to throttle config requests + config_monitor->>+conn_2: get_config() + conn_2->>+kv_node_2: get_config() + kv_node_2->>-conn_2: configuration(epoch=1, rev=11) + conn_2->>-config_monitor: apply new configuration + config_monitor->>retry_orchestrator: purge waiting queue(epoch=1, rev=11) + retry_orchestrator->>conn_1: retry get("foo") + conn_1->>+kv_node_2: get("foo", vb=115) + kv_node_2->>-conn_1: Success() +``` + +# Language Specifics + +## Feature Checklist + +1. `GetClusterConfigWithKnownVersion` (`0x1d`). The SDK should always supply current configuration version if the + connection has acknowledged feature flag. + +2. `DedupeNotMyVbucketClustermap` (`0x1e`). The SDK should be ready that the KV engine will not repeat configuration + payload if it already been sent to the socket by any means (`NotMyVbucket` status, `ClustermapChangeNotification`, + `GetClusterConfig`). + +3. Out-of-Order Execution. `Duplex` (`0x0c`) feature should be always negotiated in HELLO. + +4. `ClustermapChangeNotificationBrief` (`0x1f`). The SDK should always subscribe for configuration notifications, if the + server supports it, and fallback to polling if it does not. + +5. SDK should not emit configuration refresh request if there is one already in-flight. This should be independent of + the source of the signal, as it might come from all the nodes during rebalance when the configuration push is + enabled, or from `NotMyVbucket` responses. + +6. [OPTIONAL] `SnappyEverywhere` (`0x13`). The SDK should be ready that KV engine might send Snappy-compressed payload with any + of the response types (including push notifications). Check datatype `SNAPPY` (`0x02`). + +# Open Questions + +1. Behaviour in mixed clusters. Upgrade, when new nodes can push config, while old nodes cannot. Downgrade, when new + nodes cannot push configuration (should we even consider downgrade?). + +2. TBD + +3. TBD + +# Revisions + +* Revision #1 (2023-XX-YY; Sergey Avseyev) + * Completed initial draft. + +# Signoff + +| Language | Team Member | Signoff Date | Revision | +|-------------|----------------|--------------|----------| +| .NET | Jeffry Morris | | | +| C/C++ | Sergey Avseyev | | | +| Go | Charles Dixon | | | +| Java/Kotlin | David Nault | | | +| Node.js | Jared Casey | | | +| PHP | Sergey Avseyev | | | +| Python | Jared Casey | | | +| Ruby | Sergey Avseyev | | | +| Scala | Graham Pople | | | + +[kv-unordered-execution]: https://github.com/couchbase/kv_engine/blob/master/docs/UnorderedExecution.md +[kv-duplex]: https://github.com/couchbase/kv_engine/blob/master/docs/Duplex.md +[rfc-0005]: rfc/0005-vbucket-retries.md +[rfc-0024]: rfc/0024-fast-failover.md From 41b63eca7fa1ac80aae02b3b38fe45865ef7deaf Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Tue, 27 Jun 2023 15:22:45 +0300 Subject: [PATCH 02/15] Update * clarify bootstrap changes * explain SnappyEverywhere name --- rfc/0048-sdk3-bootstrapping.md | 7 +++++++ ...-faster-failover-and-configuration-push.md | 19 ++++++++++++++++--- 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/rfc/0048-sdk3-bootstrapping.md b/rfc/0048-sdk3-bootstrapping.md index 5b14e41..4c6620a 100644 --- a/rfc/0048-sdk3-bootstrapping.md +++ b/rfc/0048-sdk3-bootstrapping.md @@ -113,6 +113,12 @@ Once a cluster-level configuration has been successfully retrieved from the clus In addition to establishing further connections to the cluster, the client should begin refreshing the configuration periodically (typically 2.5s) from the cluster in round-robin fashion using any connected memcached connections as per [SDK-RFC#24 Fast Failover][sdk-rfc-0024] until the cluster is closed or a bucket is opened (cluster configuration switches to being derived from bucket configurations at that point, as described in the section named 'Bucket Connection Sequence' in this document). +[SDK-RFC#75 Faster Failover and Configuration Push][sdk-rfc-0075] describes new method of configuration delivery, when +the KV engine announces updates using `CLUSTERMAP_CHANGE_NOTIFICATION` (`0x01`) operation. This method should be +preferred if the HELLO feature `ClustermapChangeNotificationBrief` (`0x1f`) is acknowledged by the server. Such nodes +should not be used for configuration polling. Although the SDK still might use polling mechanism for older nodes, that +do not support push mechanism. + ### HTTP Fallback During the CCCP phase of connecting, it is possible for the client to fall back to HTTP bootstrapping if the server or bucket type does not support CCCP.  In this case, the client should open a streaming bucket configuration connection to the server at `/pools/default/bs/$BUCKET_NAME`.  This will provide the client with a normal bucket configuration that can be used to perform bucket operations as well as infer the cluster configuration, as is done with a CCCP configuration.  In the case of a memcached bucket, the CCCP operation will fail and an HTTP fallback is expected to occur.  The client must destroy and refresh the HTTP streaming connection periodically in accordance with the configuration option, this is to ensure that dead connections are identified as soon as reasonable. @@ -243,3 +249,4 @@ class PasswordAuthenticator { [sdk-rfc-0024]: /rfc/0024-fast-failover.md [sdk-rfc-0035]: /rfc/0035-rto.md [sdk-rfc-0047]: https://docs.google.com/document/d/1B4QM9UO6kz2yjLrBqLjSgArUeM1DvzKnakC_e8KfrmY/edit +[sdk-rfc-0075]: /rfc/0075-faster-failover-and-configuration-push.md diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index 9b78893..02aa6e4 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -122,6 +122,9 @@ The KV engine introduces a new HELLO flag called `SnappyEverywhere` with a value negotiated, the node will always use the compressed version of the cluster configuration and data type flags will be set to `JSON | SNAPPY` (`0x03`). +Note, that the meaning of the flag `SnappyEverywhere` is that SDK expects and properly handles compression for **ANY** +operation during communication with the KV engine, this is why the flag called "SnappyEverywhere", and "SnappyConfig". + ### `GetClusterConfig` and Out-of-Order Execution [https://issues.couchbase.com/browse/MB-56885]: # @@ -214,7 +217,9 @@ reduce the number of requests. Further details on this topic will be covered in The previously mentioned `ClustermapChangeNotificationBrief` feature enables the SDK to subscribe all connections for configuration updates. These notifications are lightweight and can be deduplicated by the server when the -`DedupeNotMyVbucketClustermap` option is negotiated. +`DedupeNotMyVbucketClustermap` option is negotiated. When the SDK connection receives `CLUSTERMAP_CHANGE_NOTIFICATION` +(`0x01`) packet, the SDK must send `GET_CLUSTER_CONFIG` (`0xb5`) to the same socket to retrieve the actual +configuration. #### Mixed Clusters @@ -223,6 +228,13 @@ In clusters where there is a mix of nodes with older server versions, meaning th support for configuration pushes from the server. As a result, the monitor should utilize the old polling mechanism for this particular node instead. +### Bootstrap Changes + +[RFC-0048][rfc-0048] describes bootstrap process for the SDK and the KV connections in particular. With Faster Failover +mechanism implemented, the SDK should not start polling for the nodes, where the KV engine has acknowledged +`ClustermapChangeNotificationBrief` feature. Such connections are expected to be notified by the KV engine when the +configuration will be received.t + ### Enhancements in Handling the `NotMyVbucket` Status Combination of `DedupeNotMyVbucketClustermap` and `ClustermapChangeNotificationBrief` allows to save traffic by not @@ -311,5 +323,6 @@ sequenceDiagram [kv-unordered-execution]: https://github.com/couchbase/kv_engine/blob/master/docs/UnorderedExecution.md [kv-duplex]: https://github.com/couchbase/kv_engine/blob/master/docs/Duplex.md -[rfc-0005]: rfc/0005-vbucket-retries.md -[rfc-0024]: rfc/0024-fast-failover.md +[rfc-0005]: /rfc/0005-vbucket-retries.md +[rfc-0024]: /rfc/0024-fast-failover.md +[rfc-0048]: /rfc/0048-sdk3-bootstrapping.md From 0a4590b8568152366df1cd3fcd8d496dfb2169ad Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Tue, 27 Jun 2023 15:28:48 +0300 Subject: [PATCH 03/15] note about polling when brief push is not available --- rfc/0075-faster-failover-and-configuration-push.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index 02aa6e4..a88227c 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -151,6 +151,9 @@ drawbacks, such as: 2. In a Lambda scenario, if failover occurs while the SDK process is paused, upon resuming, the SDK must process all updates on all sockets. This process takes unnecessary time, unlike when the SDK polls every 2.5 seconds. +The SDK is not supposed to negotiate `ClustermapChangeNotification` (`0x0d`), and must use polling mechanism if brief +version is not available. + Since version 7.6, the KV engine introduces the HELLO flag `ClustermapChangeNotificationBrief` (`0x1f`). This flag instructs the KV engine to exclude the cluster configuration content from the notification. In this case, the data type will be `RAW` (`0x00`). Below is the typical structure of the notification when the brief mode is enabled: From 1a3cfe3c8e684ee3112821c6d3d403fcd8759f9d Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Tue, 27 Jun 2023 15:35:25 +0300 Subject: [PATCH 04/15] clarify client-side deduplication in check list --- rfc/0075-faster-failover-and-configuration-push.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index a88227c..5a3a44d 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -289,9 +289,9 @@ sequenceDiagram 4. `ClustermapChangeNotificationBrief` (`0x1f`). The SDK should always subscribe for configuration notifications, if the server supports it, and fallback to polling if it does not. -5. SDK should not emit configuration refresh request if there is one already in-flight. This should be independent of - the source of the signal, as it might come from all the nodes during rebalance when the configuration push is - enabled, or from `NotMyVbucket` responses. +5. SDK should track which revision was used when last `GET_CLUSTER_CONFIG` was sent. So that if new request comes with + the same revision or older, it should be ignored. This should be independent of the source of the signal, as it might + come from all the nodes during rebalance when the configuration push is enabled, or from `NotMyVbucket` responses. 6. [OPTIONAL] `SnappyEverywhere` (`0x13`). The SDK should be ready that KV engine might send Snappy-compressed payload with any of the response types (including push notifications). Check datatype `SNAPPY` (`0x02`). From 92753ecf25a958d5b55a6e99dc0c99a0b81e6ccf Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Tue, 27 Jun 2023 19:53:51 +0300 Subject: [PATCH 05/15] highlight that requests for configs with old revisions should be ignored --- ...-faster-failover-and-configuration-push.md | 33 ++++++++++--------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index 5a3a44d..1eb5b64 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -257,20 +257,20 @@ a new configuration. ```mermaid sequenceDiagram autonumber - conn_1->>+kv_node_1: get("foo", vb=115) - kv_node_1->>-conn_1: NotMyVbucket(epoch=1, rev=11) - conn_1-->>+retry_orchestrator: pending(get, "foo", epoch=1, rev=11) - retry_orchestrator-->retry_orchestrator: put operation to wating queue - conn_1-->>+config_monitor: refresh configuration - config_monitor-->config_monitor: wait to throttle config requests - config_monitor->>+conn_2: get_config() - conn_2->>+kv_node_2: get_config() - kv_node_2->>-conn_2: configuration(epoch=1, rev=11) - conn_2->>-config_monitor: apply new configuration - config_monitor->>retry_orchestrator: purge waiting queue(epoch=1, rev=11) - retry_orchestrator->>conn_1: retry get("foo") - conn_1->>+kv_node_2: get("foo", vb=115) - kv_node_2->>-conn_1: Success() + conn_1 ->>+ kv_node_1: get("foo", vb=115) + kv_node_1 ->>- conn_1: NotMyVbucket(epoch=1, rev=11) + conn_1 -->>+ retry_orchestrator: pending(get, "foo", epoch=1, rev=11) + retry_orchestrator --> retry_orchestrator: put operation to wating queue + conn_1 -->>+ config_monitor: refresh configuration + config_monitor --> config_monitor: wait to throttle config requests + config_monitor ->>+ conn_2: get_config() + conn_2 ->>+ kv_node_2: get_config() + kv_node_2 ->>- conn_2: configuration(epoch=1, rev=11) + conn_2 ->>- config_monitor: apply new configuration + config_monitor ->> retry_orchestrator: purge waiting queue(epoch=1, rev=11) + retry_orchestrator ->> conn_1: retry get("foo") + conn_1 ->>+ kv_node_2: get("foo", vb=115) + kv_node_2 ->>- conn_1: Success() ``` # Language Specifics @@ -289,8 +289,9 @@ sequenceDiagram 4. `ClustermapChangeNotificationBrief` (`0x1f`). The SDK should always subscribe for configuration notifications, if the server supports it, and fallback to polling if it does not. -5. SDK should track which revision was used when last `GET_CLUSTER_CONFIG` was sent. So that if new request comes with - the same revision or older, it should be ignored. This should be independent of the source of the signal, as it might +5. SDK-side deduplication. + SDK should track which revision was used when last `GET_CLUSTER_CONFIG` was sent. So that if new request comes with + the same revision or older, it should be **ignored**. This should be independent of the source of the signal, as it might come from all the nodes during rebalance when the configuration push is enabled, or from `NotMyVbucket` responses. 6. [OPTIONAL] `SnappyEverywhere` (`0x13`). The SDK should be ready that KV engine might send Snappy-compressed payload with any From 41538e6ff76add2877cc972ec462b4b7ffc07707 Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Tue, 27 Jun 2023 20:30:14 +0300 Subject: [PATCH 06/15] add clarification on how KV engine deduplicates config payloads --- rfc/0075-faster-failover-and-configuration-push.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index 1eb5b64..3818769 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -109,10 +109,15 @@ The KV engine updates the pushed configuration version in the following cases: * Configuration sent to the SDK in response to a `GetClusterConfig` (`0xb5`) request. * Configuration pushed to the SDK that enabled the HELLO flag `ClustermapChangeNotification` (`0x0d`). -Note, that `DedupeNotMyVbucketClustermap` affects `ClustermapChangeNotification` and `ClustermapChangeNotificationBrief` +NOTE: `DedupeNotMyVbucketClustermap` affects `ClustermapChangeNotification` and `ClustermapChangeNotificationBrief` features, that described below. In other words, if deduplication enabled, the cluster configuration will be announce for the socket connection only once. +NOTE: KV engine does not inspect the body of the configuration, it only looks at `epoch`/`revision` pair to detect +duplicates (decide if the SDK seen the update). It means, that SDK still might receive updates that do not change +VBucket map or topology (number, order or names of the nodes). It is up to the SDK to decide if the configuration +affects the sockets or pipeline configuration, which must be already implemented for older generation of the servers. + ### Enforcing Snappy Compression for Cluster Configuration Payloads [https://review.couchbase.org/c/kv_engine/+/192152]: # From af58f5a14d286a787ad6e9168afa1fcbd246d05e Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Tue, 27 Jun 2023 22:38:35 +0300 Subject: [PATCH 07/15] update NotMyVbucket workflow, fixed typos --- ...-faster-failover-and-configuration-push.md | 83 ++++++++++++------- 1 file changed, 52 insertions(+), 31 deletions(-) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index 3818769..f32285f 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -128,7 +128,7 @@ negotiated, the node will always use the compressed version of the cluster confi to `JSON | SNAPPY` (`0x03`). Note, that the meaning of the flag `SnappyEverywhere` is that SDK expects and properly handles compression for **ANY** -operation during communication with the KV engine, this is why the flag called "SnappyEverywhere", and "SnappyConfig". +operation during communication with the KV engine, this is why the flag called "SnappyEverywhere", and not "SnappyConfig". ### `GetClusterConfig` and Out-of-Order Execution @@ -157,7 +157,7 @@ drawbacks, such as: updates on all sockets. This process takes unnecessary time, unlike when the SDK polls every 2.5 seconds. The SDK is not supposed to negotiate `ClustermapChangeNotification` (`0x0d`), and must use polling mechanism if brief -version is not available. +version is not available. `ClustermapChangeNotification` still available in post-7.6 versions. Since version 7.6, the KV engine introduces the HELLO flag `ClustermapChangeNotificationBrief` (`0x1f`). This flag instructs the KV engine to exclude the cluster configuration content from the notification. In this case, the data type @@ -245,37 +245,59 @@ configuration will be received.t ### Enhancements in Handling the `NotMyVbucket` Status -Combination of `DedupeNotMyVbucketClustermap` and `ClustermapChangeNotificationBrief` allows to save traffic by not -sending configuration, if SDK already seen the same revision, and also sends only pair of `Epoch`/`Revision`. So it is -up to SDK to initiate configuration update once the non-empty payload returned along with `NotMyVbucket` status code. +`DedupeNotMyVbucketClustermap` feature allows to save traffic by not sending configuration, if SDK already seen the same +revision. Several modifications are required in the SDK: -1. The retry orchestrator should be able to retry an operation based on configuration updates rather than the timer signal. -2. The configuration monitor should have the ability to throttle configuration requests due to the following reasons: - 1. During rebalance, multiple operations may return a `NotMyVbucket` status, triggering a configuration refresh. - 2. Since `ClustermapChangeNotificationBrief` will cause all connections to subscribe to updates and receive them, it is - necessary to account for potential high volumes of updates. - -Below is a diagram that illustrates an example of the SDK workflow, where the GET request is waiting for the arrival of -a new configuration. +1. Response handler should tolerate empty response with `NotMyVbucket` status, as the KV engine assumes that the SDK + already seen configuration, and there is no newer configuration available. In this case SDK should just retry the + operation. + +2. If the reponse payload contains body, it contains current configuration, which should be sent to configuration + monitor (manager). The SDK should either synchronously apply configuration, create waiting queue for given + `epoch`/`revision` pair. + Once configuration applied, the SDK must check if new configuration routes the operation to new endpoint or new + vbucket on the old endpoint, and *immediately* dispatch operation to new endpoint (or same endpoint in case vbucketID + has changed). In any other case, the SDK should send operation to retry orchestrator. + ```mermaid + flowchart + A(NotMyVbucket) --> B{Empty Body?} + B -->|No|C(Apply Configuration) + C --> D{Route Operation} + D -->|Endpoint Changed| E[Dispatch To
New Endpoint] + D -->|VBucketID Changed| F[Update VBucketID] + F --> G[Dispatch To
Same Endpoint] + D -->|Everything Else| H[Send To Retry
Orchestrator] + B -->|Yes|H[Send To Retry
Orchestrator] + ``` + + +Below is a diagram that illustrates an example of the SDK workflow ```mermaid sequenceDiagram autonumber - conn_1 ->>+ kv_node_1: get("foo", vb=115) - kv_node_1 ->>- conn_1: NotMyVbucket(epoch=1, rev=11) - conn_1 -->>+ retry_orchestrator: pending(get, "foo", epoch=1, rev=11) - retry_orchestrator --> retry_orchestrator: put operation to wating queue - conn_1 -->>+ config_monitor: refresh configuration - config_monitor --> config_monitor: wait to throttle config requests - config_monitor ->>+ conn_2: get_config() - conn_2 ->>+ kv_node_2: get_config() - kv_node_2 ->>- conn_2: configuration(epoch=1, rev=11) - conn_2 ->>- config_monitor: apply new configuration - config_monitor ->> retry_orchestrator: purge waiting queue(epoch=1, rev=11) - retry_orchestrator ->> conn_1: retry get("foo") - conn_1 ->>+ kv_node_2: get("foo", vb=115) - kv_node_2 ->>- conn_1: Success() + + participant conn_1 + participant kv_node_1 + participant retry_orchestrator + participant config_monitor + + conn_1 ->>+ kv_node_1: get("foo", vb=115) + kv_node_1 ->>- conn_1: NotMyVbucket(config={epoch=1, rev=11, ...}) + + conn_1 -->>+ config_monitor: propose confg={epoch=1, rev=11} + + + critical Check if config route operation to different node of vbucket + + option "foo" still maps to kv_node_1 + conn_1 -->>+ retry_orchestrator: retry(get, "foo", reason=NotMyVbucket, epoch=1, rev=11) + + option "foo" does not map to kv_node_1, or vbucket has changed + conn_1 -->+ kv_node_1: get("foo", vb=new_vbucket) + kv_node_1 ->>- conn_1: Success() + end ``` # Language Specifics @@ -294,10 +316,9 @@ sequenceDiagram 4. `ClustermapChangeNotificationBrief` (`0x1f`). The SDK should always subscribe for configuration notifications, if the server supports it, and fallback to polling if it does not. -5. SDK-side deduplication. - SDK should track which revision was used when last `GET_CLUSTER_CONFIG` was sent. So that if new request comes with - the same revision or older, it should be **ignored**. This should be independent of the source of the signal, as it might - come from all the nodes during rebalance when the configuration push is enabled, or from `NotMyVbucket` responses. +5. SDK-side deduplication of push notifications. SDK should track which revision was used when last `GET_CLUSTER_CONFIG` + was sent. So that if new notification comes with the same revision or older, it should be **ignored**. It should also + ignore notifications which `epoch`/`revision` are not newer than effective configuration that is used by the SDK. 6. [OPTIONAL] `SnappyEverywhere` (`0x13`). The SDK should be ready that KV engine might send Snappy-compressed payload with any of the response types (including push notifications). Check datatype `SNAPPY` (`0x02`). From 7883de4aa48ffe2624f10f37f44fb28250ae33a1 Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Tue, 27 Jun 2023 23:04:27 +0300 Subject: [PATCH 08/15] Cover mixed mode case and push workflow --- ...-faster-failover-and-configuration-push.md | 24 ++++++++++++------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index f32285f..d351f92 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -225,9 +225,16 @@ reduce the number of requests. Further details on this topic will be covered in The previously mentioned `ClustermapChangeNotificationBrief` feature enables the SDK to subscribe all connections for configuration updates. These notifications are lightweight and can be deduplicated by the server when the -`DedupeNotMyVbucketClustermap` option is negotiated. When the SDK connection receives `CLUSTERMAP_CHANGE_NOTIFICATION` -(`0x01`) packet, the SDK must send `GET_CLUSTER_CONFIG` (`0xb5`) to the same socket to retrieve the actual -configuration. +`DedupeNotMyVbucketClustermap` option is negotiated. + +It is not guaranteed that the configuration will be immediately available on the all nodes of the cluster once one of +the node sends `CLUSTERMAP_CHANGE_NOTIFICATION` request to SDK. To workaround this issue, the SDK should remeber +`epoch`/`revision` pair from notification payload as "recently announced" version, and ensure that the configuration +monitor will continue attempts to fetch configuration until the received body will have version that is not older than +"recently announced". + +As an optional optimization, the SDK might perform attempts to retrieve configuration starting from the node that +pushed configuration notification. Although it might not be possible to do easily in all SDKs. #### Mixed Clusters @@ -236,6 +243,10 @@ In clusters where there is a mix of nodes with older server versions, meaning th support for configuration pushes from the server. As a result, the monitor should utilize the old polling mechanism for this particular node instead. +In other words, the configuration monitor should continue polling only if there are nodes, that do not support +`ClustermapChangeNotificationBrief`, otherwise configuration monitor issue `GET_CLUSTER_CONFIG` request only when +notification arrives. + ### Bootstrap Changes [RFC-0048][rfc-0048] describes bootstrap process for the SDK and the KV connections in particular. With Faster Failover @@ -325,12 +336,7 @@ sequenceDiagram # Open Questions -1. Behaviour in mixed clusters. Upgrade, when new nodes can push config, while old nodes cannot. Downgrade, when new - nodes cannot push configuration (should we even consider downgrade?). - -2. TBD - -3. TBD +1. TBD # Revisions From 019b2434a33ae7a5a022d6fa060786111e30146c Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Mon, 10 Jul 2023 13:44:44 +0300 Subject: [PATCH 09/15] note about errors from HELLO operation --- rfc/0075-faster-failover-and-configuration-push.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index d351f92..c0fdebd 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -334,6 +334,11 @@ sequenceDiagram 6. [OPTIONAL] `SnappyEverywhere` (`0x13`). The SDK should be ready that KV engine might send Snappy-compressed payload with any of the response types (including push notifications). Check datatype `SNAPPY` (`0x02`). +7. SDK should be able to handle errors from HELLO. It is not critical right now, but in theory it might be happening in + future. For example, Sending `ClustermapChangeNotificationBrief` (`0x1f`) without `Duplex` (`0x0c`) will trigger + response with the status code `Einval` (`0x04`) and body `{"error":{"context":"ClustermapChangeNotificationBrief needs Duplex"}}`. + See [hello\_packet\_executor.cc][kv-engine-hello-error] for more details. + # Open Questions 1. TBD @@ -362,3 +367,4 @@ sequenceDiagram [rfc-0005]: /rfc/0005-vbucket-retries.md [rfc-0024]: /rfc/0024-fast-failover.md [rfc-0048]: /rfc/0048-sdk3-bootstrapping.md +[kv-engine-hello-error]: https://github.com/couchbase/kv_engine/blob/fc4e8f7a71609687302f3f54d2f5052f86105400/daemon/protocol/mcbp/hello_packet_executor.cc#L147-L152 From 040406a3cc65775848f2afae4726296a9801ce61 Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Mon, 10 Jul 2023 15:23:35 +0300 Subject: [PATCH 10/15] clarify HELLO error handling --- rfc/0075-faster-failover-and-configuration-push.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index c0fdebd..3e95ce0 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -334,10 +334,11 @@ sequenceDiagram 6. [OPTIONAL] `SnappyEverywhere` (`0x13`). The SDK should be ready that KV engine might send Snappy-compressed payload with any of the response types (including push notifications). Check datatype `SNAPPY` (`0x02`). -7. SDK should be able to handle errors from HELLO. It is not critical right now, but in theory it might be happening in - future. For example, Sending `ClustermapChangeNotificationBrief` (`0x1f`) without `Duplex` (`0x0c`) will trigger - response with the status code `Einval` (`0x04`) and body `{"error":{"context":"ClustermapChangeNotificationBrief needs Duplex"}}`. - See [hello\_packet\_executor.cc][kv-engine-hello-error] for more details. +7. SDK should be able to handle errors from HELLO. The error should be logged and the exception should include the + details if it is possible. For example, Sending `ClustermapChangeNotificationBrief` (`0x1f`) without `Duplex` + (`0x0c`) will trigger response with the status code `Einval` (`0x04`) and body + `{"error":{"context":"ClustermapChangeNotificationBrief needs Duplex"}}`. + See [hello\_packet\_executor.cc][kv-engine-hello-error] for more details. # Open Questions From 123bb43afdb5f9205ac00b52ab7ad4fca4555186 Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Mon, 10 Jul 2023 15:26:56 +0300 Subject: [PATCH 11/15] note about OoO and NMV deduplication --- rfc/0075-faster-failover-and-configuration-push.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index 3e95ce0..262c7b2 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -257,7 +257,8 @@ configuration will be received.t ### Enhancements in Handling the `NotMyVbucket` Status `DedupeNotMyVbucketClustermap` feature allows to save traffic by not sending configuration, if SDK already seen the same -revision. +revision. The KV engine guarantees that Out-of-Order does not affect deduplication process, so the SDK always receives +configuration attached if necessary, regardless the order of the operations. Several modifications are required in the SDK: 1. Response handler should tolerate empty response with `NotMyVbucket` status, as the KV engine assumes that the SDK From c08136a0275e02131498be9d2affb1e127c2a2c0 Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Thu, 13 Jul 2023 14:03:43 +0300 Subject: [PATCH 12/15] fix point 3 in checklist --- rfc/0075-faster-failover-and-configuration-push.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index 262c7b2..b7ccd8b 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -323,7 +323,8 @@ sequenceDiagram payload if it already been sent to the socket by any means (`NotMyVbucket` status, `ClustermapChangeNotification`, `GetClusterConfig`). -3. Out-of-Order Execution. `Duplex` (`0x0c`) feature should be always negotiated in HELLO. +3. Bi-directional communication. `Duplex` (`0x0c`) feature should be always negotiated in HELLO. It is required for + `ClustermapChangeNotificationBrief`. 4. `ClustermapChangeNotificationBrief` (`0x1f`). The SDK should always subscribe for configuration notifications, if the server supports it, and fallback to polling if it does not. From 9dc0ad3008fb6bf18e58f5a955055a2fd35c143e Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Thu, 13 Jul 2023 14:06:21 +0300 Subject: [PATCH 13/15] update encoding diagrams --- rfc/0075-faster-failover-and-configuration-push.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index b7ccd8b..0a93485 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -49,7 +49,7 @@ does not change the behavior of the server but allows determining if the node su `GetClusterConfig` (`0xb5`) operation. If the node acknowledges `GetClusterConfigWithKnownVersion`, then the SDK can use the new version of the command. -Epoch and revision are signed 64-bit integers encoded in network (big-endian) order. +Epoch and revision are signed 64-bit integers encoded in network (big-endian) order and should be encoded as extras. Byte/ 0 | 1 | 2 | 3 | @@ -83,7 +83,7 @@ Epoch and revision are signed 64-bit integers encoded in network (big-endian) or Magic (0) : 0x80 (client request, SDK -> kv_engine) Opcode (1) : 0xb5 Key length (2,3) : 0x0000 - Extra length (4) : 0x00 + Extra length (4) : 0x10 (16 bytes, two int64_t fields in extras) Data type (5) : 0x00 (RAW) Vbucket (6,7) : 0x0000 Total body (8-11) : 0x00000010 (16 bytes) @@ -195,7 +195,7 @@ will be `RAW` (`0x00`). Below is the typical structure of the notification when Magic (0) : 0x82 (server request, kv_engine -> SDK) Opcode (1) : 0x01 Key length (2,3) : 0x0000 - Extra length (4) : 0x10 (two int64_t fields in extras) + Extra length (4) : 0x10 (16 bytes, two int64_t fields in extras) Data type (5) : 0x00 (RAW) Vbucket (6,7) : 0x0000 Total body (8-11) : 0x00000010 (16 bytes) From 1270f30cc89193862f8d050353fe5db26393ada9 Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Fri, 4 Aug 2023 19:26:23 +0300 Subject: [PATCH 14/15] typos in packet layout --- ...-faster-failover-and-configuration-push.md | 32 ++++++++----------- 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index 0a93485..7c7c054 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -58,9 +58,9 @@ Epoch and revision are signed 64-bit integers encoded in network (big-endian) or +---------------+---------------+---------------+---------------+ 0| 0x80 | 0xb5 | 0x00 | 0x00 | +---------------+---------------+---------------+---------------+ - 4| 0x00 | 0x00 | 0x00 | 0x00 | + 4| 0x00 | 0x10 | 0x00 | 0x00 | +---------------+---------------+---------------+---------------+ - 8| 0x00 | 0x00 | 0x00 | 0x00 | + 8| 0x00 | 0x00 | 0x00 | 0x10 | +---------------+---------------+---------------+---------------+ 12| 0xde | 0xad | 0xbe | 0xef | +---------------+---------------+---------------+---------------+ @@ -72,14 +72,12 @@ Epoch and revision are signed 64-bit integers encoded in network (big-endian) or +---------------+---------------+---------------+---------------+ 28| 0x00 | 0x00 | 0x00 | 0x00 | +---------------+---------------+---------------+---------------+ - 32| 0x00 | 0x00 | 0x00 | 0x00 | + 32| 0x08 | 0x07 | 0x06 | 0x05 | +---------------+---------------+---------------+---------------+ - 36| 0x08 | 0x07 | 0x06 | 0x05 | - +---------------+---------------+---------------+---------------+ - 40| 0x04 | 0x03 | 0x02 | 0x01 | + 36| 0x04 | 0x03 | 0x02 | 0x01 | +---------------+---------------+---------------+---------------+ GET_CLUSTER_CONFIG command - Field (offset) (value) + Field (offset) (value, bytes swapped to host order) Magic (0) : 0x80 (client request, SDK -> kv_engine) Opcode (1) : 0xb5 Key length (2,3) : 0x0000 @@ -170,9 +168,9 @@ will be `RAW` (`0x00`). Below is the typical structure of the notification when +---------------+---------------+---------------+---------------+ 0| 0x82 | 0x01 | 0x00 | 0x00 | +---------------+---------------+---------------+---------------+ - 4| 0x00 | 0x00 | 0x00 | 0x00 | + 4| 0x00 | 0x10 | 0x00 | 0x00 | +---------------+---------------+---------------+---------------+ - 8| 0x00 | 0x00 | 0x00 | 0x00 | + 8| 0x00 | 0x00 | 0x00 | 0x10 | +---------------+---------------+---------------+---------------+ 12| 0x00 | 0x00 | 0x00 | 0x00 | +---------------+---------------+---------------+---------------+ @@ -184,14 +182,12 @@ will be `RAW` (`0x00`). Below is the typical structure of the notification when +---------------+---------------+---------------+---------------+ 28| 0x00 | 0x00 | 0x00 | 0x00 | +---------------+---------------+---------------+---------------+ - 32| 0x00 | 0x00 | 0x00 | 0x00 | - +---------------+---------------+---------------+---------------+ - 36| 0x08 | 0x07 | 0x06 | 0x05 | + 32| 0x08 | 0x07 | 0x06 | 0x05 | +---------------+---------------+---------------+---------------+ - 40| 0x04 | 0x03 | 0x02 | 0x01 | + 36| 0x04 | 0x03 | 0x02 | 0x01 | +---------------+---------------+---------------+---------------+ CLUSTERMAP_CHANGE_NOTIFICATION command - Field (offset) (value) + Field (offset) (value, bytes swapped to host order) Magic (0) : 0x82 (server request, kv_engine -> SDK) Opcode (1) : 0x01 Key length (2,3) : 0x0000 @@ -265,14 +261,14 @@ Several modifications are required in the SDK: already seen configuration, and there is no newer configuration available. In this case SDK should just retry the operation. -2. If the reponse payload contains body, it contains current configuration, which should be sent to configuration +2. If the reponse payload has body, it contains current configuration, which should be sent to configuration monitor (manager). The SDK should either synchronously apply configuration, create waiting queue for given `epoch`/`revision` pair. Once configuration applied, the SDK must check if new configuration routes the operation to new endpoint or new vbucket on the old endpoint, and *immediately* dispatch operation to new endpoint (or same endpoint in case vbucketID has changed). In any other case, the SDK should send operation to retry orchestrator. ```mermaid - flowchart + flowchart A(NotMyVbucket) --> B{Empty Body?} B -->|No|C(Apply Configuration) C --> D{Route Operation} @@ -306,7 +302,7 @@ sequenceDiagram option "foo" still maps to kv_node_1 conn_1 -->>+ retry_orchestrator: retry(get, "foo", reason=NotMyVbucket, epoch=1, rev=11) - option "foo" does not map to kv_node_1, or vbucket has changed + option "foo" does not map to kv_node_1, or vbucket has changed conn_1 -->+ kv_node_1: get("foo", vb=new_vbucket) kv_node_1 ->>- conn_1: Success() end @@ -340,7 +336,7 @@ sequenceDiagram details if it is possible. For example, Sending `ClustermapChangeNotificationBrief` (`0x1f`) without `Duplex` (`0x0c`) will trigger response with the status code `Einval` (`0x04`) and body `{"error":{"context":"ClustermapChangeNotificationBrief needs Duplex"}}`. - See [hello\_packet\_executor.cc][kv-engine-hello-error] for more details. + See [hello\_packet\_executor.cc][kv-engine-hello-error] for more details. # Open Questions From 05dcb0e02a01231e3e595f43519b6d89039d5a6a Mon Sep 17 00:00:00 2001 From: Sergey Avseyev Date: Fri, 4 Aug 2023 16:27:19 +0000 Subject: [PATCH 15/15] Update rfc/0075-faster-failover-and-configuration-push.md Co-authored-by: David Nault --- rfc/0075-faster-failover-and-configuration-push.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfc/0075-faster-failover-and-configuration-push.md b/rfc/0075-faster-failover-and-configuration-push.md index 7c7c054..d4cdd7d 100644 --- a/rfc/0075-faster-failover-and-configuration-push.md +++ b/rfc/0075-faster-failover-and-configuration-push.md @@ -91,7 +91,7 @@ Epoch and revision are signed 64-bit integers encoded in network (big-endian) or Revision (32-39): 0x0102030405060708 (72623859790382856 in base-10) If the node has a cluster configuration newer than what is specified in the example, the response will include the new -configuration in the body with the data type set to `JSON` (`0x01`). Otherwise, the response will have an empty body +configuration in the body with the `JSON` (`0x01`) bit set in the data type. Otherwise, the response will have an empty body with the data type `RAW` (`0x00`). ### Deduplicate Cluster Configuration for `NotMyVbucket` Responses