Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd cgroup driver not used when running inside systemd-nspawn container with systemd #11734

Open
edysli opened this issue Feb 8, 2025 · 1 comment
Milestone

Comments

@edysli
Copy link

edysli commented Feb 8, 2025

Environmental Info:
K3s Version:
k3s version v1.32.0+k3s1 (cca8fac)
go version go1.23.3

Node(s) CPU architecture, OS, and Version:
Linux Catachan 6.8.0-52-generic #53~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 15 19:18:46 UTC 2 x86_64 GNU/Linux

Cluster Configuration:
single node

Describe the bug:
I've had to force both the kubelet and containerd to use systemd cgroup driver for containers to run. systemd and cgroups v2 aren't properly when running inside a systemd-nspawn container with user namespacing enabled (systemd-nspawn --quiet --keep-unit --boot --link-journal=try-guest --network-veth -U --settings=override --machine=k3s).

I believe the code in function SetupContainerdConfig is wrongly configuring containerd. cgroups v2 are available and systemd is running as the init system, so the systemd cgroup driver should be used.

Steps To Reproduce:

  • Installed K3s: I did the airgap install and dowloaded the installation script as well as the images inside the systemd container's file system.
  • INSTALL_K3S_BIN_DIR_READ_ONLY=true INSTALL_K3S_SKIP_ENABLE=true /usr/local/bin/k3s-install.sh

Expected behavior:
Essential containers in the kube-system namespace (coredns, local-path-provisioner, metrics-server, traefik) should run.

/var/lib/rancher/k3s/agent/etc/containerd/config.toml contains:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

Adding

kubelet-arg:
  - "cgroup-driver=systemd"

to /etc/rancher/k3s/config.yaml shouldn't be required (it's much easier to do than fixing the generated containerd configuration file however).

Actual behavior:
coredns, local-path-provisioner, metrics-server, traefik deployments are stuck in CrashLoopBackOff. The kubelet keeps killing and starting them for no apparent reason.

/var/lib/rancher/k3s/agent/etc/containerd/config.toml contains:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = false

Additional context / logs:

@brandond
Copy link
Member

brandond commented Feb 9, 2025

cfg.AgentConfig.Systemd = !isRunningInUserNS && controllers["cpuset"] && os.Getenv("INVOCATION_ID") != ""

Figure out why these checks are failing under systemd-nspawn.

We only test running k3s as a traditional systemd service (either privileged or as a user unit if rootless) so other more esoteric systemd configurations probably need work.

PR appreciated.

@brandond brandond moved this from New to Accepted in K3s Development Feb 9, 2025
@brandond brandond added this to the Backlog milestone Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Status: Accepted
Development

No branches or pull requests

2 participants