-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve replicated-loglet restatectl commands #2681
Conversation
d774db8
to
22468ce
Compare
Note to reviewers. The code is far from being pretty, and I'd try and avoid tackling nits at the moment, so the main goal is to inform and spot if there is a clear bug in logic. |
This also adds a little bit more stress to the tests to help them fail more often if there is an issue ``` // intentionally empty ```
This fixes mishandling of deleted and unknown nodes in the config in f-majority checks. Integration tests were misconfigured where the nodeset was [N2..N4] where N4 didn't actually exist in config. In this case we should not accept f-majority seal if only one node is sealed (replication=2) Although this bug wouldn't impact us immediately, it's best to fix this condition and I took it as an opportunity to update the semantics of provisioning state to match the latest design direction. Documentation has also been updated to reflect the correct semantics. Summary: - Nodes observed in node-set but not in nodes-config is treated as "provisioning" rather than "disabled" - Nodes that are "deleted" in config (tombstone exists) are treated as "disabled" - Nodes in provisioning are fully authoritative, but are automatically excluded from new nodesets (already filtered by candidacy filter in nodeset selector) - If provisioning nodes were added to the nodeset, they are treated as fully authoritative and are required to participate in f-majority. ``` // intentionally empty ```
- Trim operation will wait for f-majority before reporting success to increase reliability of subsequent get_trim_point - Adds protection against a dangerous scenario if the loglet over-reported its trim point in a sealed loglet case. The loglet might have more records than the effective sealed tail, it should never report a trim point beyond that (if this happens, the system will believe that the subsequent segment is missing records) - Remove superfluous check. The trim task already checks that trim point is clamped to the known global tail ``` // intentionally empty ```
- `restatectl replicated-loglet info` now prints a table with info from every node in the nodeset - `restatectl replicated-loglet digest` doesn't require --from/--to to function, and fixes for overblown memory usage if the supplied range is unnecessarily large - For both commands, lots of UI improvements. Some screenshots will be attached in comments. ``` // intentionally empty ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool improvements for restatectl
. LGTM. +1 for merging :-)
.copied() | ||
.filter_map(|node_id| { | ||
let node = nodes_config.find_node_by_id(node_id).unwrap_or_else(|_| { | ||
panic!("Node {node_id} doesn't seem to exist in nodes configuration"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this happen if we are operating on a slightly outdated NodesConfiguration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in theory it can happen.
records_table.add_row(std::iter::repeat("═════════").take(nodeset.len() + 1)); | ||
// append the node-level info at the end | ||
{ | ||
let mut row = Vec::with_capacity(nodeset.len() + 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be nodeset.len() + 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have one for offset, and one for "issues", and yeah it can be 1 since I don't use the issues column in this, but would it matter if I leave it to match the table? :)
Would you consider this a nitpick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, feel free to ignore.
for (offset, responses) in digests.iter() { | ||
checker.fill_with_default(); | ||
if *offset >= digests.max_local_tail() { | ||
break; | ||
} | ||
if *offset == known_global_tail.latest_offset() { | ||
// divider to indicate that everything after global tail | ||
records_table.add_row(std::iter::repeat("────").take(nodeset.len() + 2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The global tail is 75095, the output is correct.
restatectl replicated-loglet info
now prints a table with info from every node in the nodesetrestatectl replicated-loglet digest
doesn't require --from/--to to function, and fixes for overblown memory usage if the supplied range is unnecessarily largeStack created with Sapling. Best reviewed with ReviewStack.