Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s: containerd crashes "failed to recover state: failed to get metadata for stored sandbox" #391969

Open
3 tasks done
heywoodlh opened this issue Mar 21, 2025 · 6 comments · May be fixed by #392169
Open
3 tasks done

k3s: containerd crashes "failed to recover state: failed to get metadata for stored sandbox" #391969

heywoodlh opened this issue Mar 21, 2025 · 6 comments · May be fixed by #392169
Labels
0.kind: bug Something is broken

Comments

@heywoodlh
Copy link
Contributor

heywoodlh commented Mar 21, 2025

Nixpkgs version

  • Unstable (25.05)

Describe the bug

On nixpkgs-unstable (my current rev is bfa9810ff7104a17555ab68ebdeafb6705f129b1), k3s crashes due to containerd failing to start.

I've had this K3s cluster around for over a year mostly without issues. After an update to nixpkgs in my flake, k3s failed to start.

Seems related:

Steps to reproduce

Upgrade an existing cluster to current version of k3s in unstable (1.32.2+k3s1)

Expected behaviour

k3s not crash after update

Screenshots

No response

Relevant log output

sudo tail /var/lib/rancher/k3s/agent/containerd/containerd.log
time="2025-03-21T16:28:18.122659182-06:00" level=info msg="Connect containerd service"
time="2025-03-21T16:28:18.122683068-06:00" level=info msg="using experimental NRI integration - disable nri plugin to prevent this"
time="2025-03-21T16:28:18.130504899-06:00" level=info msg="Start subscribing containerd event"
time="2025-03-21T16:28:18.130549599-06:00" level=info msg="Start recovering state"
time="2025-03-21T16:28:18.130618603-06:00" level=info msg=serving... address=/run/k3s/containerd/containerd.sock.ttrpc
time="2025-03-21T16:28:18.130700040-06:00" level=info msg=serving... address=/run/k3s/containerd/containerd.sock
time="2025-03-21T16:28:18.148272201-06:00" level=error msg="failed to recover sandbox state" error="unable to find sandbox \"02501aa4a04a1a9f4885c15fedee594072ff14965e73fd7afc6956a6fa29f8d3\": not found" sandbox=02501aa4a04a1a9f4885c15fedee594072ff14965e73fd7afc6956a6fa29f8d3
time="2025-03-21T16:28:18.148345326-06:00" level=error msg="failed to recover sandbox state" error="unable to find sandbox \"062fab96fdfe7a99186fee91fabf5fa86626715cb13c99dcd729a24a76172886\": not found" sandbox=062fab96fdfe7a99186fee91fabf5fa86626715cb13c99dcd729a24a76172886
time="2025-03-21T16:28:18.148410140-06:00" level=error msg="failed to recover sandbox state" error="unable to find sandbox \"7f017d764c5f1427d1e144b1b42c4c37d680d55aff5e589dafc81af63387742b\": not found" sandbox=7f017d764c5f1427d1e144b1b42c4c37d680d55aff5e589dafc81af63387742b
time="2025-03-21T16:28:18.148423410-06:00" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to get metadata for stored sandbox \"8397d2a607e2a1c38ae0033aae7aae9f7362ce1c6f31cb9d715d94246e565f62\": not found"

Additional context

Configuration:

{ config, pkgs, ... }:

let
  system = pkgs.system;
in {
  networking.firewall.allowedTCPPorts = [ 6443 ];
  # https://docs.k3s.io/installation/requirements#networking
  networking.firewall.interfaces.tailscale0.allowedTCPPorts = [ 6443 2379 2380 8472 10250 51820 51821 5001 ];
  networking.firewall.trustedInterfaces = [ "tailscale0" ];
  services.k3s = {
    package = pkgs.k3s;
    enable = true;
    role = "server";
    clusterInit = false;
  };
  environment.systemPackages = [
    pkgs.k3s
    pkgs.nfs-utils
  ];
  systemd.services.k3s.path = with pkgs; [
    ipset
    openiscsi
  ];
  security.pam.loginLimits = [
    {
      domain = "*";
      type = "-";
      item = "nofile";
      value = "9192";
    }
  ];
}

System metadata

 - system: `"x86_64-linux"`
 - host os: `Linux 6.13.7-xanmod1, NixOS, 25.05 (Warbler), 25.05.20250321.bfa9810`

Notify maintainers


Note for maintainers: Please tag this issue in your pull request description. (i.e. Resolves #ISSUE.)

I assert that this issue is relevant for Nixpkgs

Is this issue important to you?

Add a 👍 reaction to issues you find important.

@heywoodlh heywoodlh added the 0.kind: bug Something is broken label Mar 21, 2025
@DarkKirb
Copy link
Contributor

#391293 updated to containerd 2.0.4, it should hit nixos-unstable in a few days

https://nixpk.gs/pr-tracker.html?pr=391293

@heywoodlh
Copy link
Contributor Author

I tried using the k3s package provided with that PR, but it still seems to have the same issue. Is this because the k3s package in nixpkgs includes its own bundled containerd? https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/networking/cluster/k3s/builder.nix#L318-L333

I have my own branch that I've recently rebased from master and am attempting to resolve this issue: https://github.com/heywoodlh/nixpkgs/tree/391969-k3s-containerd-update

I don't understand the relationship between the containerd package in nixpkgs and k3s' bundled containerd so any clarification would be appreciated 😄

@heywoodlh heywoodlh linked a pull request Mar 22, 2025 that will close this issue
13 tasks
@DarkKirb
Copy link
Contributor

DarkKirb commented Mar 22, 2025

oh wait i didn’t read your config carefully enough

your configuration uses the containerd bundled with k3s, not nixpkgs

you can however tell it to use external containerd if you set up containerd correctly

https://github.com/DarkKirb/nixos-config/blob/e41f5e81b7b1ea03697944ae04ef0ef57514f3c9/services/kubernetes/default.nix#L38-L61

and then point k3s at that containerd

https://github.com/DarkKirb/nixos-config/blob/e41f5e81b7b1ea03697944ae04ef0ef57514f3c9/services/kubernetes/default.nix#L26-L28

not sure where i copied this from but it was probably some nixos wiki

@heywoodlh
Copy link
Contributor Author

Update: seems that my PR #392169 resolves my containerd crash after I rename the old containerd directory:

sudo mv /var/lib/rancher/k3s/agent/containerd /var/lib/rancher/k3s/agent/containerd.bak

@heywoodlh
Copy link
Contributor Author

@DarkKirb inspired by your comment, I decided to workaround this by switching to Docker as my container runtime by adding the --docker flag to my k3s configuration. Thanks for the example snippet for using external containerd -- I think that would be helpful to add to the k3s readme in nixpkgs.

@rorosen
Copy link
Contributor

rorosen commented Mar 23, 2025

Another possible workaround is to overwrite the k3s sources with the master branch, i.e.

  services.k3s.package = pkgs.k3s.override {
    overrideBundleAttrs = {
      src = pkgs.fetchgit {
        url = "https://github.com/k3s-io/k3s";
        rev = "7837d29269970088eaa019a2d7e61ecdfb68d985";
        sha256 = "sha256-8voWwI3dWzG3E8TJet0m+TcMialM16AZA1/fMPH/DnY=";
      };
      vendorHash = "sha256-Wgla9Cyq5U9Q0xs/C/iyAMwHkIug7ernl7w5mn3gSco=";
    };
  };

This will pull in the latest k3s sources and thus build k3s with containerd 2.0.4 bundled in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants