Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s 1.32 upgrade (CI to use 1.31) #466

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mythi
Copy link
Contributor

@mythi mythi commented Nov 22, 2024

No description provided.

@mythi mythi requested a review from a team as a code owner November 22, 2024 09:21
Copy link
Member

@bpradipt bpradipt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@wainersm
Copy link
Member

Hi @mythi !

I'm not sure about this bump. The CI for operator doesn't run functional tests but rather mere install/uninstall operations, coco's functional tests rely on Kata CI. In there, it's still using v1.30 for sev-snp and tdx (v1.28 for s390x) if I'm not mistaken.

The question is: shouldn't we keep same k8s version used in here and Kata CI? If yes, then we probably should bump the version in kata CI first.

@ryansavino @AdithyaKrishnan @fidencio @stevenhorsman

@mythi
Copy link
Contributor Author

mythi commented Nov 22, 2024

I'm OK to drop 5258d2d if that's the concern?

@wainersm
Copy link
Member

Hi @mythi !

I'm OK to drop 5258d2d if that's the concern?

That's what I had in mind.

@mythi
Copy link
Contributor Author

mythi commented Nov 25, 2024

That's what I had in mind.

I dropped the commit. However, I feel we should always be moving to the latest released k8s version to find potential problems as soon as possible.

@ldoktor
Copy link
Contributor

ldoktor commented Nov 25, 2024

Being a newbie here but I also think the upstream projects should try consuming the latest and greatest bits. Downstream should verify and patch things in case something gets broken, but upstream should move fast and break stuff, that's the purpose of upstream, isn't it?

As for the functional testing, I don't think the operator requires much but perhaps it'd be nice to bring back at least very simple scenario (the testing takes 20m so adding basic functional test shouldn't increase that significantly)

@wainersm
Copy link
Member

That's what I had in mind.

I dropped the commit. However, I feel we should always be moving to the latest released k8s version to find potential problems as soon as possible.

I agree with you. I'm just concern on doing it right before a release and I think we should start bumping on kata CI side (due the lack of functional tests I mentioned). I will reach to maintainers of jobs on kata CI to see if they can upgrade to k8s 1.31.

@wainersm
Copy link
Member

Being a newbie here but I also think the upstream projects should try consuming the latest and greatest bits. Downstream should verify and patch things in case something gets broken, but upstream should move fast and break stuff, that's the purpose of upstream, isn't it?

Agree.

As for the functional testing, I don't think the operator requires much but perhaps it'd be nice to bring back at least very simple scenario (the testing takes 20m so adding basic functional test shouldn't increase that significantly)

#446 is a tentative to introduce some basic function tests back. Unfortunately Gabriela switched to another project, so she won't be able to keep working on it.

Copy link
Member

@wainersm wainersm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mythi !

@wainersm
Copy link
Member

I see the error in sev and snp-sev jobs:

::info:: Bring up the test cluster
+ ./cluster/up.sh
W1125 13:39:21.695747 3509546 common.go:101] your configuration file uses a deprecated API spec: "kubeadm.k8s.io/v1beta3" (kind: "ClusterConfiguration"). Please use 'kubeadm config migrate --old-config old.yaml --new-config new.yaml', which will write the new, similar spec using a newer API version.
[init] Using Kubernetes version: v1.30.1
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: "1.31.1" Control plane version: "1.30.1"
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

The previous version of this PR had 1.31 installed then latest reverted back to 1.30. Maybe CI didn't wipe out kubelet from the node properly?

I'm re-running the jobs to see if they pass...

@AdithyaKrishnan @ryansavino could you have a look if they don't pass?

@mythi
Copy link
Contributor Author

mythi commented Nov 27, 2024

FYI, I added af08f6f back.

@AdithyaKrishnan
Copy link
Contributor

AdithyaKrishnan commented Nov 27, 2024

I see the error in sev and snp-sev jobs:

::info:: Bring up the test cluster
+ ./cluster/up.sh
W1125 13:39:21.695747 3509546 common.go:101] your configuration file uses a deprecated API spec: "kubeadm.k8s.io/v1beta3" (kind: "ClusterConfiguration"). Please use 'kubeadm config migrate --old-config old.yaml --new-config new.yaml', which will write the new, similar spec using a newer API version.
[init] Using Kubernetes version: v1.30.1
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: "1.31.1" Control plane version: "1.30.1"
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

The previous version of this PR had 1.31 installed then latest reverted back to 1.30. Maybe CI didn't wipe out kubelet from the node properly?

I'm re-running the jobs to see if they pass...

@AdithyaKrishnan @ryansavino could you have a look if they don't pass?

Fixed @wainersm @ryansavino

@mythi
Copy link
Contributor Author

mythi commented Nov 28, 2024

It looks the sev-snp cleanup isn't working well. The latest run suggests that a cluster is still running?

@wainersm
Copy link
Member

Hi @mythi , regarding the bump of k8s version, yesterday on CI meeting we talked about a policy. On Kata CI, the non-TEE job, is running in AKS with unpinned version; meaning it will be using the latest available version according to AKS's policy which is getting N-1. For example, currently on the east-us region N = 1.30, so it's installed are using 1.29. While we won't be on latest and greatest upstream k8s version, we agreed that it will be sufficiently close and enough for our testing purposes. On operator side, we will be free to pick up the latest upstream k8s version + we plan to introduce some functional (smoke) tests to help on finding any breakage.

@mythi mythi changed the title k8s 1.31 upgrade k8s 1.32 upgrade (CI to use 1.31) Dec 13, 2024
@mythi mythi requested a review from wainersm December 13, 2024 14:05
@mythi
Copy link
Contributor Author

mythi commented Dec 13, 2024

I re-requested a review from @wainersm as I bumped to 1.32 k8s version for the dependencies. CI is still at 1.31.

mythi added 5 commits January 2, 2025 22:03
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Security scanners may warn about earlier versions and v0.31.0
fixes a CVE.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
v1.31 has been out for some time so we better move to test
with it.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
@wainersm
Copy link
Member

wainersm commented Jan 7, 2025

s390x job failure should be fixed with #480

But TDX and SEV we will need some manual clean ups on the machines (Cc @fidencio @AdithyaKrishnan )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants