-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API-1690: KMS Encryption Provider for Etcd Secrets #1682
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
Skipping CI for Draft Pull Request. |
Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
@swghosh: This pull request references API-1690 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/cc @dgrisonnet |
- "@tkashem" | ||
- "@rvanderp" | ||
approvers: | ||
- "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tkashem should be the default approver for anything api related as the team lead
You can add either myself or @deads2k for API approvers. But you need one of us. I've already looked at the API so probably me on this occasion.
@tkashem please review this EP and let me know that you're happy conceptually, and then I'll start reviewing from an API perspective
We should only merge the feature gate PR once this has had an initial round of review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add more details on this EP so we can have it ready for the initial round of review.
Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
fd0142d
to
06f1d10
Compare
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
/cc @tkashem |
- "@tkashem" | ||
- "@deads2k" | ||
- "@derekwaynecarr" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally you pick one approver, else it becomes difficult to rally everyone around, perhaps some of these should be reviewers instead?
- "@TrilokGeer" | ||
- "@tkashem" | ||
- "@rvanderp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What expertise do each of the reviewers bring? What are you asking them to think about, see the template for examples of why we add context about reviewers
One aspect of the upstream feature that wasn't mentioned yet is that it requires a third-party application called a KMS plugin to bridge between the apiservers and the KMS. In OpenShift, these plugins will be configured and managed by the kube-apiserver-operator. There are multiple reasons behind this choice: | ||
|
||
1. Reduces the complexity for the users that want to use the KMS feature | ||
2. Simplifies key rotation when users manually rotate the key because it requires creating a second instance of the plugin that would use the new key while the old plugin would still allow using the old key as a read key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested to see the mechanics of managing two instances fleshed out if they aren't already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on what "manually rotate the key" entails?
I was reading through the KMS provider documentation and came across this: https://kubernetes.io/docs/tasks/administer-cluster/kms-provider/#developing-a-kms-plugin-gRPC-server-notes-kms-v2
It sounds like in KMS v2 the expectation is that the KMS plugin will return a new key_id
in the Status
procedure call response when the KMS key encryption key (KEK) has rotated. If the key_id
changes in the response received from the Status
procedure call, the API server is signaled that the KEK has changed and the data encrypted with the old KEK should be marked as stale.
I do see that https://github.com/openshift/aws-encryption-provider?tab=readme-ov-file#rotation mentions this dual instance plugin approach, but to my untrained eye the documentation there seems to be tailored around the older KMS v1 implementation.
Please correct me if I misunderstood the KMS v2 approach, but from what I can tell it does seem like KMS v2 remediates the need to do something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see now that for something like the aws-encryption-provider
the KMS master key is passed as a CLI flag in the Pod
, which would explain the need to have one deployed with the old key reference and one deployed with the new.
Apologies for any noise :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if there is some room for improvements that could be proposed upstream to facilitate master key rotation in these plugins without having to run multiple instances to perform the migration. (assuming my understanding of being able to rotate the key_id
in the KMS v2 API is true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested to see the mechanics of managing two instances fleshed out if they aren't already
I have some flowschemas detailing the different mechanisms, I'll add them once they are fully complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, I've identified two different ways for user to rotate keys:
- The cloud KMS periodically rotates the key. In this case, the cloud resource for the key will remain the same, but the underlying key and its id will change. (https://docs.aws.amazon.com/kms/latest/developerguide/rotate-keys.html). For this particular scenario, we don't need a second instance nor do we need to restart the apiserver because we track the
key_id
and pass it down in the Decrypt procedure call. But we still want to migrate all the data to use the latest instance of the key to reduce the attack surface area. - The user points the KMS provider to a new key (cloud resource). In this case you need to create a second instance because each instance is bound to a specific key and until you've migrated all the data to use the new key, the original key need remain usable as a read-key.
For 1) there is definitely room for improvement upstream and https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4192-svm-in-tree could help with automating the data migration process.
For 2) since we are essentially pointing to two different keys, it is consistent with the other encryption provider (aesgcm, aescbc, ... ) to have two different instances of KMS plugins. It is that way by design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is useful, lets make sure this context gets into the doc so it doesn't end up lost to GitHub comments eternally
* https://github.com/openshift/aws-encryption-provider/ | ||
* https://github.com/openshift/azure-kubernetes-kms/ | ||
|
||
For the plugins we can't distribute, an `image` field will be available in the relevant KMS API to allow users to configure the plugin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide an example of one that we cannot distribute/expand on why we can't distribute, and how we will support users using these? Where does out support boundary start/end?
|
||
During the tech-preview support of the KMS feature, it will be placed behind the `KMSEncryptionProvider` feature-gate. | ||
|
||
OpenShift would need to align closer with KMS evolution upstream with respect to the different Kubernetes Encryption Providers available today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really follow what you're meaning by this, can you expand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is referring to the various providers that are supported upstream but not downstream. Namely coming closer with this list: https://github.com/kubernetes/apiserver/blob/cccad306d649184bf2a0e319ba830c53f65c445c/pkg/apis/apiserver/types_encryption.go#L89-L101. I'll add some more info about that.
|
||
## Drawbacks | ||
|
||
1. Core apiserver operators need host access to mount and manage permissions for the directory where the kms plugin runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a net new permission that they didn't need before?
## Drawbacks | ||
|
||
1. Core apiserver operators need host access to mount and manage permissions for the directory where the kms plugin runs. | ||
2. Can cause problems during disaster recovery and backups in case KMS keys becomes unavailable after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can you mitigate this risk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still WIP, I'll elaborate on the risks from a high-level here, but https://issues.redhat.com/browse/API-1825 is a GA requirement which requires us to write a doc going in full depth about the risks and the mitigations that we have for them.
We will list all the failures scenarios there and explain in detail the actions that users need to take to detect and fix the issues. We will turn this document into KCS later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the place to track more, might be worth linking that issue in the GA requirements part of the doc, and saying here that you can read more in jira
|
||
### Non-Goals | ||
|
||
1. Allow the user to force key rotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any caveats to this non-goal?
What happens if a user does an on-demand key rotation in their KMS instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be here. We copied it from aescbc/aesgcm goals where users can't force rotation in a supported way.
For KMS, users will be able to force the rotation either in their KMS instance or manually by pointing the encryption configuration to a new key.
3. Support for hardware security modules | ||
4. The user has an in-depth understanding of each phase of the encryption process | ||
5. Completely recover the cluster in the event of the KMS instance itself going down or keys getting lost | ||
6. Allow users to configure which resources will be encrypted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something planned for the future?
I think this is a reasonable non-goal for a first pass, but taking a look at the EncryptionConfiguration
resource type it appears to have support for this kind of configuration. I would imagine there is use case(s) for customers to have more control over the set of resources that are encrypted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it is not right now and I don't remember seeing a RFE about that since we've supported encryption.
For OpenShift we are choosing which resources are encrypted and users just choose if they want to turn on encryption or not.
In case you've not seen the doc yet, https://docs.openshift.com/container-platform/4.17/security/encrypting-etcd.html shows the procedure to enable encryption and from a user perspective it is super simple and very high-level.
Co-authored-by: Joel Speed <Joel.speed@hotmail.co.uk>
@swghosh: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
[TP] Support Kube KMS Integration in OCP (User-Provided)