Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IPv6 support for kindnet #17190

Merged
merged 2 commits into from
Jan 9, 2025
Merged

Conversation

hakman
Copy link
Member

@hakman hakman commented Jan 8, 2025

@k8s-ci-robot k8s-ci-robot requested a review from aojea January 8, 2025 19:46
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. area/addons labels Jan 8, 2025
@hakman
Copy link
Member Author

hakman commented Jan 8, 2025

/test ?

@hakman
Copy link
Member Author

hakman commented Jan 8, 2025

/test pull-kops-e2e-cni-kindnet-ipv6

@kubernetes kubernetes deleted a comment from k8s-ci-robot Jan 8, 2025
@hakman
Copy link
Member Author

hakman commented Jan 8, 2025

/test pull-kops-e2e-cni-kindnet-ipv6

@aojea
Copy link
Member

aojea commented Jan 8, 2025

test failing are related to statefulset that can not be scheduled

I0108 21:02:10.663885 49753 dump.go:53] At 2025-01-08 20:52:09 +0000 UTC - event for datadir-ss-0: {ebs.csi.aws.com_ebs-csi-controller-65dbd65d98-v8sgz_184282f2-0eed-4f1d-b96a-af2e043c46a0 } ProvisioningFailed: failed to provision volume with StorageClass "kops-csi-1-21": rpc error: code = Internal desc = Could not create volume "pvc-1e7a5728-e634-490b-8187-78a1627d64e8": could not create volume in EC2: operation error EC2: CreateVolume, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, failed to get rate limit token, retry quota exceeded, 0 available, 5 requested


Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should provide basic identity expand_more10m4s | Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should provide basic identity expand_more | 10m4s
-- | -- | --
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should provide basic identity expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should perform rolling updates and roll backs of template modifications with PVCs expand_more10m4s | Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should perform rolling updates and roll backs of template modifications with PVCs expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should perform rolling updates and roll backs of template modifications with PVCs expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should delete PVCs after adopting pod (WhenDeleted) expand_more10m4s | Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should delete PVCs after adopting pod (WhenDeleted) expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should delete PVCs after adopting pod (WhenDeleted) expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should delete PVCs with a WhenDeleted policy expand_more10m5s | Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should delete PVCs with a WhenDeleted policy expand_more | 10m5s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should delete PVCs with a WhenDeleted policy expand_more | 10m5s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should adopt matching orphans and release non-matching pods expand_more10m4s | Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should adopt matching orphans and release non-matching pods expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should adopt matching orphans and release non-matching pods expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should not delete PVCs when there is another controller expand_more10m4s | Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should not delete PVCs when there is another controller expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should not delete PVCs when there is another controller expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-storage] PVC Protection Verify that PVC in active use by a pod is not removed immediately expand_more5m3s | Kubernetes e2e suite: [It] [sig-storage] PVC Protection Verify that PVC in active use by a pod is not removed immediately expand_more | 5m3s
Kubernetes e2e suite: [It] [sig-storage] PVC Protection Verify that PVC in active use by a pod is not removed immediately expand_more | 5m3s
Kubernetes e2e suite: [It] [sig-storage] PVC Protection Verify that scheduling of a pod that uses PVC that is being deleted fails and the pod becomes Unschedulable expand_more5m3s | Kubernetes e2e suite: [It] [sig-storage] PVC Protection Verify that scheduling of a pod that uses PVC that is being deleted fails and the pod becomes Unschedulable expand_more | 5m3s
Kubernetes e2e suite: [It] [sig-storage] PVC Protection Verify that scheduling of a pod that uses PVC that is being deleted fails and the pod becomes Unschedulable expand_more | 5m3s
Kubernetes e2e suite: [It] [sig-storage] PVC Protection Verify "immediate" deletion of a PVC that is not in active use by a pod expand_more5m3s | Kubernetes e2e suite: [It] [sig-storage] PVC Protection Verify "immediate" deletion of a PVC that is not in active use by a pod expand_more | 5m3s
Kubernetes e2e suite: [It] [sig-storage] PVC Protection Verify "immediate" deletion of a PVC that is not in active use by a pod expand_more | 5m3s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should delete PVCs with a OnScaledown policy expand_more10m4s | Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should delete PVCs with a OnScaledown policy expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should delete PVCs with a OnScaledown policy expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should delete PVCs after adopting pod (WhenScaled) expand_more10m4s | Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should delete PVCs after adopting pod (WhenScaled) expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should delete PVCs after adopting pod (WhenScaled) expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should not deadlock when a pod's predecessor fails expand_more10m4s | Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should not deadlock when a pod's predecessor fails expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should not deadlock when a pod's predecessor fails expand_more | 10m4s
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should not delete PVC with OnScaledown policy if another controller owns the PVC expand_more | Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should not delete PVC with OnScaledown policy if another controller owns the PVC expand_more
Kubernetes e2e suite: [It] [sig-apps] StatefulSet Non-retain StatefulSetPersistentVolumeClaimPolicy should not delete PVC with OnScaledown policy if another controller owns the PVC expand_more


all the other 933 tests are passing

the ebs-csi-controller seems to require some permissions

https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/kops/17190/pull-kops-e2e-cni-kindnet-ipv6/1877087283962712064/artifacts/cluster-info/kube-system/ebs-csi-controller-65dbd65d98-v8sgz/csi-provisioner.log

I0108 21:03:04.353521 1 controller.go:843] CreateVolume failed, supports topology = true, node selected true => may reschedule = true => state = Finished: rpc error: code = Internal desc = Could not create volume "pvc-be11211d-1a70-4573-931a-1f1da03eff77": could not create volume in EC2: operation error EC2: CreateVolume, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, failed to get rate limit token, retry quota exceeded, 0 available, 5 requested

@hakman
Copy link
Member Author

hakman commented Jan 9, 2025

Pretty odd, let's retry first 🙂
/test pull-kops-e2e-cni-kindnet-ipv6

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 9, 2025
@hakman
Copy link
Member Author

hakman commented Jan 9, 2025

/test pull-kops-e2e-cni-kindnet-ipv6

@aojea
Copy link
Member

aojea commented Jan 9, 2025

still the same problem

I0109 07:29:50.210015 1 controller.go:843] CreateVolume failed, supports topology = true, node selected true => may reschedule = true => state = Finished: rpc error: code = Internal desc = Could not create volume "pvc-3e9b3371-9520-4b73-bec0-3f626907832f": could not create volume in EC2: operation error EC2: CreateVolume, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post "https://sts.us-west-2.amazonaws.com/": EOF

yeah, it may be to the masquerading, there is an EOF in the request

@aojea
Copy link
Member

aojea commented Jan 9, 2025

the kindnet in the same node logs these messages

I0109 07:26:04.498470 1 proxy.go:310] Failed to connect to original destination [[64:ff9b::3477:a3dd]:443]: dial tcp4 52.119.163.221:443: connect: network is unreachable

https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/kops/17190/pull-kops-e2e-cni-kindnet-ipv6/1877249288690470912/artifacts/cluster-info/kube-system/kindnet-k89jm/kindnet-cni.log

that is an amazon ip https://urlscan.io/ip/52.94.181.70

in the docs https://kops.sigs.k8s.io/topology/#private-subnet

If the subnet is capable of IPv6, egress to the internet is typically routed through a connection-tracking firewall, such as an AWS Egress-only Internet Gateway. Egress to the NAT64 range 64:ff9b::/96 is typically routed to a NAT64 device, such as an AWS NAT Gateway.

so maybe to access that Service you need to use the provided nat64 by aws and is not available from the instances?

if c.Masquerade == nil {
c.Masquerade = &kops.KindnetMasqueradeSpec{
Enabled: fi.PtrTo(true),
if clusterSpec.IsIPv6Only() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to disable NAT64 in Kindnet If is AWS and if is IPv6 only

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or disable NAT64 gateway and allow IPv4 connectivity from the instance , I just throw the options, no strong opinion

@aojea
Copy link
Member

aojea commented Jan 9, 2025

ok, talked offline with @hakman

Kops has a NAT64 gateway setup in aws
The instance does not have IPv4 connectivity
Disabling Kindnet NAT64 on AWS with IPv6 only is one option, so the NAT64 gateway is used.

However, my feedback from users is that they want to avoid NAT gateways at all cost, they are expensive and causes problems, kindnet offers an alternative to this setup if the instance has also IPv4 connectivity by enabling NAT64, right now it fails because the instance does not have IPv4 connectivity

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 9, 2025
@hakman
Copy link
Member Author

hakman commented Jan 9, 2025

/test pull-kops-e2e-cni-kindnet-ipv6

@aojea
Copy link
Member

aojea commented Jan 9, 2025

fantastic

pull-kops-e2e-cni-kindnet-ipv6 — Job succeeded.      

@hakman hakman changed the title Test IPv6 with kindnet Add IPv6 support for kindnet Jan 9, 2025
@hakman hakman force-pushed the kindnet-ipv6 branch 2 times, most recently from d027b8e to 8ae9bae Compare January 9, 2025 13:37
@hakman
Copy link
Member Author

hakman commented Jan 9, 2025

/test pull-kops-e2e-cni-kindnet
/test pull-kops-e2e-cni-kindnet-ipv6

@hakman hakman marked this pull request as ready for review January 9, 2025 16:31
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 9, 2025
@hakman
Copy link
Member Author

hakman commented Jan 9, 2025

/retest

@aojea
Copy link
Member

aojea commented Jan 9, 2025

/lgtm

Thanks

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 9, 2025
@hakman
Copy link
Member Author

hakman commented Jan 9, 2025

Thanks for all the help getting IPv6 to work, @aojea! 🙂

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 9, 2025
@k8s-ci-robot k8s-ci-robot merged commit 671c6ee into kubernetes:master Jan 9, 2025
27 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Jan 9, 2025
@aojea
Copy link
Member

aojea commented Jan 10, 2025

Thanks for all the help getting IPv6 to work, @aojea! 🙂

thanks you all for doing this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/addons cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants