[Bifrost] FindTail task in replicated loglet #2019

AhmedSoliman · 2024-10-02T16:46:30Z

Implementation of the find-tail process in case we are don't have a leader sequencer.

Stack created with Sapling. Best reviewed with ReviewStack.

github-actions · 2024-10-02T17:51:54Z

Test Results

5 files 5 suites 2m 41s ⏱️
45 tests 45 ✅ 0 💤 0 ❌
114 runs 114 ✅ 0 💤 0 ❌

Results for commit 59e75ad.

♻️ This comment has been updated with latest results.

muhamadazmy

Thank you @AhmedSoliman for this PR. It looks good to me, I only have really minor comments, and couple of questions.

muhamadazmy · 2024-10-04T08:13:45Z

crates/bifrost/src/providers/replicated_loglet/tasks/find_tail.rs

+                            // todos:
+                            //   1- Do we have some nodes that are unsealed? let's run a seal task in the
+                            //      background, but we can safely return the result.
+                            //      iff local-tail is consistent max-local-tail.


Suggested change

// iff local-tail is consistent max-local-tail.

// if local-tail is consistent max-local-tail.

@muhamadazmy iff means "if and only if" https://en.wikipedia.org/wiki/If_and_only_if

crates/bifrost/src/providers/replicated_loglet/tasks/find_tail.rs

muhamadazmy · 2024-10-04T08:33:33Z

crates/bifrost/src/providers/replicated_loglet/tasks/find_tail.rs

+                                    max_local_tail, current_known_global
+                                );
+                            } else if max_local_tail == current_known_global ||
+                        // max_local_tail > known_global_tail


Suggested change

// max_local_tail > known_global_tail

// max_local_tail == known_global_tail

The comment is correct. it says that if the max-local-tail is higher, then we require a write-quorum check.

muhamadazmy · 2024-10-04T08:35:23Z

crates/bifrost/src/providers/replicated_loglet/tasks/find_tail.rs

+                                    global_tail: max_local_tail,
+                                };
+                            } else {
+                                // F-majority sealed, but tail needs repair in range


I don't get this part. This is basically when max_local_tail > current_known_global. Why don't we take this value as the new current_known_global. if we have a write quorum ?

We do, in line 264. What the else branch does is that is checks the availability of enough nodes such that we can establish write quorum for (any) offset, but it doesn't mean that max-local-tail is replicated on write-quorum.

muhamadazmy · 2024-10-04T08:45:01Z

crates/bifrost/src/providers/replicated_loglet/tasks/find_tail.rs

+                        if self.known_global_tail.latest_offset() >= max_local_tail {
+                            return FindTailResult::Open {
+                                global_tail: self.known_global_tail.latest_offset(),
+                            };
+                        }


I assume this can happen because by the time the node replied with their info the global tail has already advanced, correct?

Yes, or from any other external source.

muhamadazmy · 2024-10-04T08:47:14Z

crates/bifrost/src/providers/replicated_loglet/tasks/find_tail.rs

+                        if nodeset_checker.check_write_quorum(|attribute| match attribute {
+                            // We have all copies of the max-local-tail replicated. It's safe to
+                            // consider this offset as committed.
+                            NodeTailStatus::Known { local_tail, sealed } => {
+                                // not sealed, and reached its local tail reached the max we
+                                // are looking for.
+                                !sealed && (*local_tail >= max_local_tail)
+                            }
+                            _ => false,
+                        }) {
+                            return FindTailResult::Open {
+                                global_tail: max_local_tail,
+                            };
+                        }


I am wondering if there is a value between [known_global_tail..max_local_tail] that can satisfy the write quorum and hence is the real known_global_tail. Can be found by the mean of binary search this range until we hit the max value that can form a write quorum

There can be, and that'll be one of the things the RepairTask will do (not binary-search, but walking backwards from max-tail)

muhamadazmy · 2024-10-04T09:02:27Z

crates/bifrost/src/providers/replicated_loglet/tasks/find_tail.rs

+                                if nodeset_checker.check_write_quorum(NodeTailStatus::is_known) {
+                                    // We can repair.
+                                    todo!("Tail repair is not implemented yet")
+                                } else {
+                                    // wait for more nodes
+                                    break 'check_nodeset;
+                                }


Question here about the potential repair process. Will this try to copy the missing logs from the max_local_tail until it forms a write quorum ?

Or will it find the max value in range [known_global..max_local] that satisfy the write quorum?

Both. It'll first walk backwards from max-tail until it hits a record that's not fully replicated. Then it'll attempt to replicate those records (in the forward direction) until it reaches max-local-tail. It's critical that the order of repair goes in this direction to ensure correctness and repeatability if repair failed or got interrupted.

muhamadazmy · 2024-10-04T09:05:28Z

crates/bifrost/src/providers/replicated_loglet/tasks/find_tail.rs

+                                    self.my_params.loglet_id, e
+                                ));
+                            }
+                            inflight_info_requests.abort_all();


Should this be moved before the seal_task.run() ?

It'd be confusing to have two places where tasks are aborted. In general, as I mentioned in the PR description, this code is very verbose (intentionally) and it'll be refactored at a later stage once the algorithm itself is tested and its correctness is verified. So I wouldn't worry so much about the cosmetics.

muhamadazmy · 2024-10-04T09:15:21Z

crates/bifrost/src/providers/replicated_loglet/tasks/find_tail.rs

+                            if !updated {
+                                // Nothing left to wait on. We tail didn't reach expected
+                                // target and no more nodes are expected to send us responses.
+                                return FindTailResult::Error(format!(
+                                    "Could not determine a safe tail offset for loglet_id={}, perhaps too many nodes down?",
+                                    self.my_params.loglet_id));
+                            }


I am wondering if this check can happen immediately after the tokio::select! block? for clarity

I think this changes the intention. The check is intentionally done after we exhaust the other options above.

muhamadazmy · 2024-10-04T09:23:49Z

crates/bifrost/src/providers/replicated_loglet/tasks/find_tail.rs

+    }
+}
+
+struct WaitForTailOnNode {


The FindTailOnNode and WaitFotTailOnNode looks very similar maybe they can become one Generic task, since they only rely mainly on the Header in the incoming message anyway

I'd consider this as part of the cosmetic changes that I'll do later.

tillrohrmann

Really impressive work @AhmedSoliman 🚀 The changes look good to me. +1 for merging :-)

tillrohrmann · 2024-10-04T14:19:39Z

crates/bifrost/src/providers/replicated_loglet/sequencer/appender.rs

@@ -139,6 +140,13 @@ impl<T: TransportConnect> SequencerAppender<T> {
                    // since backoff can be None, or run out of iterations,
                    // but appender should never give up we fall back to fixed backoff
                    let delay = retry.next().unwrap_or(DEFAULT_BACKOFF_TIME);
+                    info!(


Could info be too verbose if there are waves of retries?

Yes, it'll be and I'll review all logging once we put things together, we have a lot of work left on logging/observability.

tillrohrmann · 2024-10-04T14:28:16Z

crates/bifrost/src/providers/replicated_loglet/tasks/find_tail.rs

+/// the previous seal process crashed). Additionally, we will start a TailRepair task to ensure consistent
+/// state of the records between known_global_tail and the max(local_tail) observed from f-majority
+/// of sealed log-servers.


For my own understanding: The TailRepair is needed for correctness because if we find a log entry that is beyond the known global tail and there are some nodes from nodeset unavailable, it could be that this entry was committed because together with the unavailable nodes there could be a write quorum. However, if it weren't and we use the offset of this entry as the global tail, then we risk that after the unavailable nodes came back, this entry couldn't be read by an f-majority read. That's why we establish write-quorum for these records to be on the safe side, right?

This, however, requires that for sealing a log, we not only need to establish f-majority but also write-quorum, right?

That's correct.

crates/bifrost/src/providers/replicated_loglet/tasks/find_tail.rs

Implementation of the find-tail process in case we are don't have a leader sequencer.

This was referenced Oct 2, 2024

[Bifrost] Unify log-server request headers and global_tail exchange in log-server responses #2008

Merged

[Bifrost] WaitForTail message implementation #2012

Merged

AhmedSoliman force-pushed the pr2019 branch 2 times, most recently from 8a4ec8c to 009af2a Compare October 2, 2024 17:10

AhmedSoliman mentioned this pull request Oct 2, 2024

Fix for RUSTSEC-2024-0376 by upgrading tonic to 0.12.3 #2020

Merged

AhmedSoliman force-pushed the pr2019 branch from 009af2a to eff3de6 Compare October 3, 2024 09:52

AhmedSoliman marked this pull request as ready for review October 3, 2024 10:11

AhmedSoliman requested review from muhamadazmy and tillrohrmann October 3, 2024 12:15

AhmedSoliman force-pushed the pr2019 branch from eff3de6 to b137954 Compare October 3, 2024 14:10

muhamadazmy reviewed Oct 4, 2024

View reviewed changes

AhmedSoliman force-pushed the pr2019 branch from b137954 to 8d1703f Compare October 4, 2024 11:46

AhmedSoliman mentioned this pull request Oct 4, 2024

[Bifrost] ReplicatedLoglet read stream #2028

Merged

tillrohrmann approved these changes Oct 4, 2024

View reviewed changes

tillrohrmann mentioned this pull request Oct 4, 2024

[ReplicatedLoglet] Implement remote sequencer find tail #2017

Closed

[Bifrost] FindTail task in replicated loglet

59e75ad

Implementation of the find-tail process in case we are don't have a leader sequencer.

AhmedSoliman force-pushed the pr2019 branch from 8d1703f to 59e75ad Compare October 4, 2024 16:44

AhmedSoliman merged commit 59e75ad into main Oct 4, 2024
17 checks passed

AhmedSoliman deleted the pr2019 branch October 4, 2024 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bifrost] FindTail task in replicated loglet #2019

[Bifrost] FindTail task in replicated loglet #2019

AhmedSoliman commented Oct 2, 2024 •

edited

Loading

github-actions bot commented Oct 2, 2024 •

edited

Loading

muhamadazmy left a comment

muhamadazmy Oct 4, 2024

AhmedSoliman Oct 4, 2024

muhamadazmy Oct 4, 2024

AhmedSoliman Oct 4, 2024

muhamadazmy Oct 4, 2024

AhmedSoliman Oct 4, 2024

muhamadazmy Oct 4, 2024

AhmedSoliman Oct 4, 2024

muhamadazmy Oct 4, 2024

AhmedSoliman Oct 4, 2024

muhamadazmy Oct 4, 2024

AhmedSoliman Oct 4, 2024

muhamadazmy Oct 4, 2024

AhmedSoliman Oct 4, 2024

muhamadazmy Oct 4, 2024

AhmedSoliman Oct 4, 2024

muhamadazmy Oct 4, 2024

AhmedSoliman Oct 4, 2024

tillrohrmann left a comment

tillrohrmann Oct 4, 2024

AhmedSoliman Oct 4, 2024

tillrohrmann Oct 4, 2024

AhmedSoliman Oct 4, 2024

	// iff local-tail is consistent max-local-tail.
	// if local-tail is consistent max-local-tail.

	// max_local_tail > known_global_tail
	// max_local_tail == known_global_tail

[Bifrost] FindTail task in replicated loglet #2019

[Bifrost] FindTail task in replicated loglet #2019

Conversation

AhmedSoliman commented Oct 2, 2024 • edited Loading

Implementation of the find-tail process in case we are don't have a leader sequencer.

github-actions bot commented Oct 2, 2024 • edited Loading

Test Results

muhamadazmy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tillrohrmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AhmedSoliman commented Oct 2, 2024 •

edited

Loading

github-actions bot commented Oct 2, 2024 •

edited

Loading