Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve replicated-loglet restatectl commands #2681

Merged
merged 5 commits into from
Feb 10, 2025
Merged

Improve replicated-loglet restatectl commands #2681

merged 5 commits into from
Feb 10, 2025

Conversation

AhmedSoliman
Copy link
Contributor

@AhmedSoliman AhmedSoliman commented Feb 9, 2025

  • restatectl replicated-loglet info now prints a table with info from every node in the nodeset
  • restatectl replicated-loglet digest doesn't require --from/--to to function, and fixes for overblown memory usage if the supplied range is unnecessarily large
  • For both commands, lots of UI improvements. Some screenshots will be attached in comments.
// intentionally empty

Stack created with Sapling. Best reviewed with ReviewStack.

Copy link

github-actions bot commented Feb 9, 2025

Test Results

  7 files  ±0    7 suites  ±0   4m 15s ⏱️ -2s
 47 tests ±0   46 ✅ ±0  1 💤 ±0  0 ❌ ±0 
182 runs  ±0  179 ✅ ±0  3 💤 ±0  0 ❌ ±0 

Results for commit 7ad4835. ± Comparison against base commit bb09c3d.

♻️ This comment has been updated with latest results.

@AhmedSoliman AhmedSoliman force-pushed the pr2681 branch 2 times, most recently from d774db8 to 22468ce Compare February 10, 2025 09:10
@AhmedSoliman
Copy link
Contributor Author

image image image

@AhmedSoliman
Copy link
Contributor Author

Note to reviewers. The code is far from being pretty, and I'd try and avoid tackling nits at the moment, so the main goal is to inform and spot if there is a clear bug in logic.

This also adds a little bit more stress to the tests to help them fail more often if there is an issue

```
// intentionally empty
```
This fixes mishandling of deleted and unknown nodes in the config in f-majority checks. Integration tests were misconfigured where the nodeset was [N2..N4] where N4 didn't actually exist in config. In this case we should not accept f-majority seal if only one node is sealed (replication=2)
Although this bug wouldn't impact us immediately, it's best to fix this condition and I took it as an opportunity to update the semantics of provisioning state to match the latest design direction. Documentation has also been updated to reflect the correct semantics.

Summary:
- Nodes observed in node-set but not in nodes-config is treated as "provisioning" rather than "disabled"
- Nodes that are "deleted" in config (tombstone exists) are treated as "disabled"
- Nodes in provisioning are fully authoritative, but are automatically excluded from new nodesets (already filtered by candidacy filter in nodeset selector)
- If provisioning nodes were added to the nodeset, they are treated as fully authoritative and are required to participate in f-majority.

```
// intentionally empty
```
- Trim operation will wait for f-majority before reporting success to increase reliability of subsequent get_trim_point
- Adds protection against a dangerous scenario if the loglet over-reported its trim point in a sealed loglet case. The loglet might have more records than the effective sealed tail, it should never report a trim point beyond that (if this happens, the system will believe that the subsequent segment is missing records)
- Remove superfluous check. The trim task already checks that trim point is clamped to the known global tail

```
// intentionally empty
```
- `restatectl replicated-loglet info` now prints a table with info from every node in the nodeset
- `restatectl replicated-loglet digest` doesn't require --from/--to to function, and fixes for overblown memory usage if the supplied range is unnecessarily large
- For both commands, lots of UI improvements. Some screenshots will be attached in comments.

```
// intentionally empty
```
Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool improvements for restatectl. LGTM. +1 for merging :-)

.copied()
.filter_map(|node_id| {
let node = nodes_config.find_node_by_id(node_id).unwrap_or_else(|_| {
panic!("Node {node_id} doesn't seem to exist in nodes configuration");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this happen if we are operating on a slightly outdated NodesConfiguration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in theory it can happen.

records_table.add_row(std::iter::repeat("═════════").take(nodeset.len() + 1));
// append the node-level info at the end
{
let mut row = Vec::with_capacity(nodeset.len() + 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be nodeset.len() + 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have one for offset, and one for "issues", and yeah it can be 1 since I don't use the issues column in this, but would it matter if I leave it to match the table? :)

Would you consider this a nitpick?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, feel free to ignore.

@AhmedSoliman AhmedSoliman merged commit 5fb2e0c into main Feb 10, 2025
25 checks passed
@AhmedSoliman AhmedSoliman deleted the pr2681 branch February 10, 2025 12:39
for (offset, responses) in digests.iter() {
checker.fill_with_default();
if *offset >= digests.max_local_tail() {
break;
}
if *offset == known_global_tail.latest_offset() {
// divider to indicate that everything after global tail
records_table.add_row(std::iter::repeat("────").take(nodeset.len() + 2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When trying the command out I got this output
Which conflicts with the comment here, the line should be after the value of 75095 not before it

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global tail is 75095, the output is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants