Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit KCP codebase for re-entrancy & error handling of non-key space operations #11184

Open
fabriziopandini opened this issue Sep 16, 2024 · 4 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@fabriziopandini
Copy link
Member

fabriziopandini commented Sep 16, 2024

There was a few interesting thread about error management for etcd's non-key space operations.

As a first reaction, I think in KCP we are generally ok, because errors reported by etcd are usually handled by re-entracy, which implies we re-assess the current state of the world before deciding the course of action.

But this is also a good chance to audit the code base for when we use non-key space operations, mostly remove member and forward leadership.

NOTE: add member/join is a slight different case, because we rely on kubeadm for it.

PS. I classified this as a bug because I did know exactly which kind to use 😅, but to be clear we are not aware of bugs it this area and this issue is to double check our codebase is robust enough to handle edge cases described in the comment above.

@fabriziopandini fabriziopandini added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Sep 16, 2024
@fabriziopandini fabriziopandini self-assigned this Sep 16, 2024
@sbueringer
Copy link
Member

sbueringer commented Sep 16, 2024

Stupid question, non-key space operations are all operations that don't read/write a key/data?

@ahrtr
Copy link
Member

ahrtr commented Sep 16, 2024

non-key space operations are all operations that don't read/write a key/data?

YES.

@fabriziopandini
Copy link
Member Author

Note: look also at how we handle errors in case kubeadm join fails and a there is member not started (without a name) sticking around

@k8s-triage-robot
Copy link

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Deprioritize it with /priority important-longterm or /priority backlog
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

5 participants