Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bifrost] FindTail task in replicated loglet #2019

Merged
merged 1 commit into from
Oct 4, 2024
Merged

[Bifrost] FindTail task in replicated loglet #2019

merged 1 commit into from
Oct 4, 2024

Conversation

AhmedSoliman
Copy link
Contributor

@AhmedSoliman AhmedSoliman commented Oct 2, 2024

Implementation of the find-tail process in case we are don't have a leader sequencer.

Stack created with Sapling. Best reviewed with ReviewStack.

Copy link

github-actions bot commented Oct 2, 2024

Test Results

  5 files    5 suites   2m 41s ⏱️
 45 tests  45 ✅ 0 💤 0 ❌
114 runs  114 ✅ 0 💤 0 ❌

Results for commit 59e75ad.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@muhamadazmy muhamadazmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @AhmedSoliman for this PR. It looks good to me, I only have really minor comments, and couple of questions.

// todos:
// 1- Do we have some nodes that are unsealed? let's run a seal task in the
// background, but we can safely return the result.
// iff local-tail is consistent max-local-tail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// iff local-tail is consistent max-local-tail.
// if local-tail is consistent max-local-tail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_local_tail, current_known_global
);
} else if max_local_tail == current_known_global ||
// max_local_tail > known_global_tail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// max_local_tail > known_global_tail
// max_local_tail == known_global_tail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is correct. it says that if the max-local-tail is higher, then we require a write-quorum check.

global_tail: max_local_tail,
};
} else {
// F-majority sealed, but tail needs repair in range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this part. This is basically when max_local_tail > current_known_global. Why don't we take this value as the new current_known_global. if we have a write quorum ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, in line 264. What the else branch does is that is checks the availability of enough nodes such that we can establish write quorum for (any) offset, but it doesn't mean that max-local-tail is replicated on write-quorum.

Comment on lines +326 to +407
if self.known_global_tail.latest_offset() >= max_local_tail {
return FindTailResult::Open {
global_tail: self.known_global_tail.latest_offset(),
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this can happen because by the time the node replied with their info the global tail has already advanced, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or from any other external source.

Comment on lines +333 to +422
if nodeset_checker.check_write_quorum(|attribute| match attribute {
// We have all copies of the max-local-tail replicated. It's safe to
// consider this offset as committed.
NodeTailStatus::Known { local_tail, sealed } => {
// not sealed, and reached its local tail reached the max we
// are looking for.
!sealed && (*local_tail >= max_local_tail)
}
_ => false,
}) {
return FindTailResult::Open {
global_tail: max_local_tail,
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if there is a value between [known_global_tail..max_local_tail] that can satisfy the write quorum and hence is the real known_global_tail. Can be found by the mean of binary search this range until we hit the max value that can form a write quorum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be, and that'll be one of the things the RepairTask will do (not binary-search, but walking backwards from max-tail)

Comment on lines 278 to 284
if nodeset_checker.check_write_quorum(NodeTailStatus::is_known) {
// We can repair.
todo!("Tail repair is not implemented yet")
} else {
// wait for more nodes
break 'check_nodeset;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question here about the potential repair process. Will this try to copy the missing logs from the max_local_tail until it forms a write quorum ?

Or will it find the max value in range [known_global..max_local] that satisfy the write quorum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both. It'll first walk backwards from max-tail until it hits a record that's not fully replicated. Then it'll attempt to replicate those records (in the forward direction) until it reaches max-local-tail. It's critical that the order of repair goes in this direction to ensure correctness and repeatability if repair failed or got interrupted.

self.my_params.loglet_id, e
));
}
inflight_info_requests.abort_all();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved before the seal_task.run() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be confusing to have two places where tasks are aborted. In general, as I mentioned in the PR description, this code is very verbose (intentionally) and it'll be refactored at a later stage once the algorithm itself is tested and its correctness is verified. So I wouldn't worry so much about the cosmetics.

Comment on lines 421 to 427
if !updated {
// Nothing left to wait on. We tail didn't reach expected
// target and no more nodes are expected to send us responses.
return FindTailResult::Error(format!(
"Could not determine a safe tail offset for loglet_id={}, perhaps too many nodes down?",
self.my_params.loglet_id));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this check can happen immediately after the tokio::select! block? for clarity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this changes the intention. The check is intentionally done after we exhaust the other options above.

}
}

struct WaitForTailOnNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FindTailOnNode and WaitFotTailOnNode looks very similar maybe they can become one Generic task, since they only rely mainly on the Header in the incoming message anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider this as part of the cosmetic changes that I'll do later.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really impressive work @AhmedSoliman 🚀 The changes look good to me. +1 for merging :-)

@@ -139,6 +140,13 @@ impl<T: TransportConnect> SequencerAppender<T> {
// since backoff can be None, or run out of iterations,
// but appender should never give up we fall back to fixed backoff
let delay = retry.next().unwrap_or(DEFAULT_BACKOFF_TIME);
info!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could info be too verbose if there are waves of retries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it'll be and I'll review all logging once we put things together, we have a lot of work left on logging/observability.

Comment on lines +38 to +40
/// the previous seal process crashed). Additionally, we will start a TailRepair task to ensure consistent
/// state of the records between known_global_tail and the max(local_tail) observed from f-majority
/// of sealed log-servers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding: The TailRepair is needed for correctness because if we find a log entry that is beyond the known global tail and there are some nodes from nodeset unavailable, it could be that this entry was committed because together with the unavailable nodes there could be a write quorum. However, if it weren't and we use the offset of this entry as the global tail, then we risk that after the unavailable nodes came back, this entry couldn't be read by an f-majority read. That's why we establish write-quorum for these records to be on the safe side, right?

This, however, requires that for sealing a log, we not only need to establish f-majority but also write-quorum, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct.

Implementation of the find-tail process in case we are don't have a leader sequencer.
@AhmedSoliman AhmedSoliman merged commit 59e75ad into main Oct 4, 2024
17 checks passed
@AhmedSoliman AhmedSoliman deleted the pr2019 branch October 4, 2024 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants