Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot update cluster config: node pool has an invalid instance type #446

Open
alamos-gmbh opened this issue Sep 11, 2024 · 8 comments
Open

Comments

@alamos-gmbh
Copy link

alamos-gmbh commented Sep 11, 2024

Hello,

we wanted to add some new nodes to our worker node pools, but hetzner changed vm types.
An extract from our cluster config (example only)

masters_pool:
  instance_type: cx21
  instance_count: 3
  location: fsn1
worker_node_pools:
- name: small-power
  instance_type: cx21
  instance_count: 3
  location: fsn1

cx21 does not exist anymore, so the script cannot validate the cluster-config

[Configuration] Validating configuration...
[Configuration] Some information in the configuration file requires your attention:
[Configuration]  - masters node pool has an invalid instance type
[Configuration]  - small-power node pool has an invalid instance type

Is there any option to add new nodes without using (ignoring?) the existing master node and worker node pools?
We thought about removing the existing pools and just add new pools with the new hetzner types, but I guess this will recreate the control plane, right?

Is it possible to create some kind of alias, so cx21 (alias) will be treated like cx22 (currently existing in hetzner cloud)
We're afraid of losing production data.

Thanks in advance!

(Will ask the CEO about sponsoring this great project!!)

@vitobotta
Copy link
Owner

Hi, as long as you don't touch the config for the masters_pool you can just replace the worker node pools. There is no way to create aliases like you ask, but the existing cx instances will stay for a while I think. You cannot create new ones but Hetzner will keep existing ones, although I don't know for how long.

For clusters created with v2, by default the instance type is no longer part of the instance name, so you are free to just rescale instances to change the instance type. But unfortunately I didn't know at the time I was working on 1.x that Hetzner would deprecate some instance types.

Back to your current situations, just leave the masters alone. For the workers, add new node pools with new instance types, drain the old node pools so the pods are rescheduled on the new ones, and then delete the old node pools. To delete the old nodes you need to

  • delete the nodes from the cluster with kubectl delete node
  • delete the instances from the Hetzner control panel
  • remove the node pool from the config file

If you do these steps carefully you shouldn't lose any data as they are simple steps anyway. Just be careful to delete the correct nodes :)

Thanks in advance if you can sponsor the project! It helps me invest more time in it :)

@alamos-gmbh
Copy link
Author

alamos-gmbh commented Sep 12, 2024

Hi,

thanks for your background information and detailed response.
Unfortunately we are in a deadlock here.

Trying as you suggested (leaving the master node pool untouched), leads to the error message above: Incompatible instance type.

We created a test setup (1) with a cx21 (deprecated) 3 master node control plane.
Leaving it as is, does not pass validation, with the error above.

Changing it to cx22 leads to a new control plane and abortion of the script in a later stage.

The only way we see is to create aliases and treat cx21 as cx22, etc. etc.

(1) More precisely: We already had a playground environment, cx21 cannot be setup anymore today

Simple example which cannot be upgraded anymore.
Cluster created with 1.x tool. Node names have the instance type in their names.

hetzner_token: TOKEN
cluster_name: mycluster
kubeconfig_path: "./kubeconfig"
k3s_version: VERSION
public_ssh_key_path: "~/.ssh/id_rsa.pub"
private_ssh_key_path: "~/.ssh/id_rsa"
use_ssh_agent: true
ssh_allowed_networks:
  - 0.0.0.0/0
api_allowed_networks:
  - 0.0.0.0/0
schedule_workloads_on_masters: false
masters_pool:
  instance_type: cx21
  instance_count: 3
  location: fsn1
worker_node_pools:
- name: small-power
  instance_type: cx21
  instance_count: 3
  location: fsn1

@vitobotta
Copy link
Owner

Ah. One detail I forgot in my previous message is that the validation of the instance type is done on each node pool as seen in the config file, so what I suggested won't help. I'll make a release with an alias for the cx21 instance type. What alternative instance type do you want to use in place of cx21? This has to wait for now though as I am currently sick.

@alamos-gmbh
Copy link
Author

alamos-gmbh commented Sep 12, 2024

Hi, thank you so much for your answer!
Much appreciated

For us it looks like this

instance type deprecated by hetzner alternative instance type
cx21 yes cx22
cx41 yes cx42
cpx31 no -
ccx33 no -

Maybe other users want to add something?

Get well soon! :-)

@AndrewBedscastle
Copy link

AndrewBedscastle commented Sep 13, 2024

(Using another github account, same person)

In the meantime I figured out how to get the 1.1.5 Version version to compile.
Crystal and all other ruby derivatives give me a hard time 🤣 I'm a professional in java, kotlin

1.1.5 because we don't want to face possible migration issues together with the instance type issues.
Should be an easy "fix" to just hardcode the master node names with the old names (cx21) so it leaves the master node pool untouched.

Offtopic (kind of)

As I said, I'll talk to the CEO about sponsoring this tool. He was too busy last couple weeks, but I'm positive we'll work something out. As you said in the discussion group, you don't use hetzner / k3s in a production workload. Actually, we do, precisely we're on a "journey" to do so. This issue here is in no way your fault, don't get me wrong. But please bear in mind, that people / companies will use it for production workloads. If development is directed in that direction (enterprise), you'll probably get more sponsors.
In detail

  • Minimize adjustments needed when upgrading to a major release (backward compatibility preferred, as long as possible)
  • LTS releases (e.g. only with sponsorship tokens)

If this shall stay a part-time / fun project, just forget about my thoughts. But I think it has great potential.

We decided for self hosting because of many reasons

  • GDPR (EU-DSGVO) issues, just not doable 100% legally with US-Providers
  • Cost (especially egress)
  • Break away from "premium" providers' constant change of requirements; we're at Google App Engine (still) and they decide to deprecate so many things, we're getting tired of it

This might be better suited for the discussion, area

edit:

Just found this discussion
#440

If this is the case, we'll probably loose the possibility to update our cluster at all :/

@vitobotta
Copy link
Owner

Hi! Thanks again in advance if you can sponsor, I'd appreciate a lot. I am keen to maintain and develop the project since it's stuff I enjoy doing, and with some sponsors it will be easier to spend more time on it.

As for the upgrade issue: if I make a release that adds the mapping between the instance types then are you ok to upgrade using v2? I can do this tomorrow.

@alamos-gmbh
Copy link
Author

I think so, yes.

Fortunately there is a test cluster to test with.
In the source code, I found a setting to (still) include instance types (in 2.x). This seems to be undocumented.
With aliases, we could keep that active, I guess.

@vitobotta
Copy link
Owner

That setting is supposed to be used only when upgrading clusters created with 1.x. It's not recommended to use that setting with new clusters to avoid problems due to different instance types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants