Pods on Worker Node Can't Communicating with API Server or Service in general #11728

DamianoSamperi · 2025-02-07T18:37:12Z

DamianoSamperi
Feb 7, 2025

Environmental Info:
K3s Version:

k3s -v
k3s version v1.31.5+k3s1 (56ec5dd4)
go version go1.22.10

Node(s) CPU architecture, OS, and Version:

Master:

uname -a
Linux dami-Aspire-A715-42G 6.8.0-52-generic #53-Ubuntu SMP PREEMPT_DYNAMIC Sat Jan 11 00:06:25 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Worker:

Linux nano99 4.9.337-tegra #1 SMP PREEMPT Thu Jun 8 21:19:14 PDT 2023 aarch64 aarch64 aarch64 GNU/Linux

Cluster Configuration:

1 server, 1 agent
Describe the bug:

I'm encountering an issue when installing K3s on my server and worker nodes. Any pod on the worker node that attempts to communicate with the API server experiences a timeout. I suspect that the issue might be related to CoreDNS. I am installing K3s without any additional configuration options, and I have also tried replacing Flannel with Calico, but the problem persists.
As an example, when attempting to install the NVIDIA Device Plugin, I get the following error:

kubectl logs nvdp-node-feature-discovery-worker-p5v25 -n nvidia-device-plugin
I0207 18:26:51.091139       1 metrics.go:44] "metrics server starting" port=":8081"
I0207 18:26:51.111579       1 nfd-worker.go:550] "starting feature discovery..."
I0207 18:26:51.112065       1 nfd-worker.go:563] "feature discovery completed"
I0207 18:26:51.112119       1 nfd-worker.go:644] "sending labeling request to nfd-master"
I0207 18:26:51.112190       1 nfd-worker.go:355] "connecting to nfd-master" address="nvdp-node-feature-discovery-master:8080"
I0207 18:26:51.112249       1 component.go:36] [core][Channel #1] Channel created
I0207 18:26:51.112330       1 component.go:36] [core][Channel #1] original dial target is: "nvdp-node-feature-discovery-master:8080"
I0207 18:26:51.112498       1 component.go:36] [core][Channel #1] parsed dial target is: {URL:{Scheme:nvdp-node-feature-discovery-master Opaque:8080 User: Host: Path: RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
I0207 18:26:51.112529       1 component.go:36] [core][Channel #1] fallback to scheme "passthrough"
I0207 18:26:51.112568       1 component.go:36] [core][Channel #1] parsed dial target is: {URL:{Scheme:passthrough Opaque: User: Host: Path:/nvdp-node-feature-discovery-master:8080 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
I0207 18:26:51.112589       1 component.go:36] [core][Channel #1] Channel authority set to "nvdp-node-feature-discovery-master:8080"
I0207 18:26:51.113021       1 component.go:36] [core][Channel #1] Resolver state updated: {
  "Addresses": [
    {
      "Addr": "nvdp-node-feature-discovery-master:8080",
      "ServerName": "",
      "Attributes": null,
      "BalancerAttributes": null,
      "Metadata": null
    }
  ],
  "Endpoints": [
    {
      "Addresses": [
        {
          "Addr": "nvdp-node-feature-discovery-master:8080",
          "ServerName": "",
          "Attributes": null,
          "BalancerAttributes": null,
          "Metadata": null
        }
      ],
      "Attributes": null
    }
  ],
  "ServiceConfig": null,
  "Attributes": null
} (resolver returned new addresses)
I0207 18:26:51.113122       1 component.go:36] [core][Channel #1] Channel switches to new LB policy "pick_first"
I0207 18:26:51.113216       1 component.go:36] [core][Channel #1 SubChannel #2] Subchannel created
I0207 18:26:51.113252       1 component.go:36] [core][Channel #1] Channel Connectivity change to CONNECTING
I0207 18:26:51.113379       1 component.go:36] [core][Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING
I0207 18:26:51.113460       1 component.go:36] [core][Channel #1 SubChannel #2] Subchannel picks a new address "nvdp-node-feature-discovery-master:8080" to connect
W0207 18:27:11.120925       1 component.go:41] [core][Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "nvdp-node-feature-discovery-master:8080", ServerName: "nvdp-node-feature-discovery-master:8080", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup nvdp-node-feature-discovery-master: i/o timeout"
I0207 18:27:11.121792       1 component.go:36] [core][Channel #1 SubChannel #2] Subchannel Connectivity change to TRANSIENT_FAILURE, last error: connection error: desc = "transport: Error while dialing: dial tcp: lookup nvdp-node-feature-discovery-master: i/o timeout"
I0207 18:27:11.122334       1 component.go:36] [core][Channel #1] Channel Connectivity change to TRANSIENT_FAILURE

This is just an example, as the same issue occurs with other applications like Prometheus.
I have also ensured that I have opened all the required ports in iptables according to the K3s requirements, and I have disabled ufw on both the worker and server nodes.
Additionally, I am not sure if it is relevant, but the pods on the worker node are unable to ping the API server, and DNS is functioning correctly on the master node.
Steps To Reproduce:

Install K3s on both server and worker nodes using the default installation (curl -sfL https://get.k3s.io | sh -).
Pods in the worker node experience timeouts when attempting to reach the API server.

Expected behavior:

Pods on the worker node should be able to communicate with the API server without experiencing timeouts. The installation of additional components like the NVIDIA Device Plugin or Prometheus should work without issues.
Actual behavior:

Pods on the worker node that try to communicate with the API server are timing out.
Additional context / logs:

Logs coreDns

kubectl logs coredns-85577567d5-z24pn -n kube-system
maxprocs: Leaving GOMAXPROCS=16: CPU quota undefined
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
.:53
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] plugin/reload: Running configuration SHA512 = 0b4864bedbec2be9d5f56a213b8a6b63704dfe9ae1a318dda30b5aae0390e6943ed5526c895c41d120a3e93ad0c6302ce165e0d703b7df918cb7797453397d1f
CoreDNS-1.12.0
linux/amd64, go1.23.3, 51e11f1
[INFO] 127.0.0.1:41865 - 30837 "HINFO IN 6186425729856058806.4569908529872970743. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.056513544s

nodes:

kubectl get nodes -o wide
NAME                   STATUS   ROLES                  AGE   VERSION        INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
dami-aspire-a715-42g   Ready    control-plane,master   15m   v1.31.5+k3s1   192.168.1.154   <none>        Ubuntu 24.04.1 LTS   6.8.0-52-generic   containerd://1.7.23-k3s2
nano99                 Ready    <none>                 1s    v1.31.5+k3s1   192.168.1.119   <none>        Ubuntu 18.04.6 LTS   4.9.337-tegra      containerd://1.7.23-k3s2

Pods in the kube-system namespace:

kubectl get pods -n kube-system
NAME                                      READY   STATUS      RESTARTS   AGE
coredns-85577567d5-z24pn                  1/1     Running     0          16m
helm-install-traefik-crd-kczsw            0/1     Completed   0          19m
helm-install-traefik-svscc                0/1     Completed   1          19m
local-path-provisioner-5cf85fd84d-cxj59   1/1     Running     0          19m
metrics-server-5985cbc9d7-ds9m6           1/1     Running     0          19m
svclb-traefik-befe89b9-bsdxr              2/2     Running     0          3m46s
svclb-traefik-befe89b9-g7bxf              2/2     Running     0          18m
traefik-5d45fc8cc9-fpvz9                  1/1     Running     0          18m

brandond · 2025-02-07T18:51:21Z

brandond
Feb 7, 2025
Collaborator

Make sure you have the vxlan kernel module loaded on all your nodes. Some SBC kernels (including RPi) do not provide it by default.
If running on a VM, try disabling vxlan hw checksum offload on the flannel interface: ethtool -K flannel.1 tx-checksum-ip-generic off.
Ref: UDP access to a service from another node is broken with hostNetworking #6664 (comment)

0 replies

DamianoSamperi · 2025-02-08T19:24:14Z

DamianoSamperi
Feb 8, 2025
Author

Thank you for the suggestion! Regarding the VXLAN kernel module, I confirmed that it's loaded on the master node (lsmod | grep vxlan shows the module) and that the CONFIG_VXLAN=y setting is enabled on the worker node.

For the second suggestion, I’ve disabled the hardware checksum offload on the flannel interface using ethtool -K flannel.1 tx-checksum-ip-generic off, but unfortunately, I’m still facing the same issue.

Additionally, I tried performing a cluster reset, but I'm encountering the following errors:

sudo /usr/local/bin/k3s server --cluster-reset
INFO[0000] Starting k3s v1.31.5+k3s1 (56ec5dd4)         
INFO[0000] Managed etcd cluster bootstrap already complete and initialized 
INFO[0000] Starting etcd for new cluster, cluster-reset=true 
INFO[0000] certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1738951501: notBefore=2025-02-07 18:05:01 +0000 UTC notAfter=2026-02-08 19:17:50 +0000 UTC 
WARN[0000] dynamiclistener [::]:6443: no cached certificate available for preload - deferring certificate load until storage initialization or first client request 
INFO[0000] Active TLS secret / (ver=) (count 10): map[listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-192.168.1.154:192.168.1.154 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-dami-aspire-a715-42g:dami-aspire-a715-42g listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/fingerprint:SHA1=62D0DB3C597444AF035D9366CFE8F2CBE08AA5EF] 
{"level":"info","ts":"2025-02-08T20:17:50.372788+0100","caller":"embed/etcd.go:128","msg":"configuring peer listeners","listen-peer-urls":["https://127.0.0.1:2380"]}
{"level":"info","ts":"2025-02-08T20:17:50.372861+0100","caller":"embed/etcd.go:497","msg":"starting with peer TLS","tls-info":"cert = /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.crt, key = /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.key, client-cert=, client-key=, trusted-ca = /var/lib/rancher/k3s/server/tls/etcd/peer-ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
{"level":"info","ts":"2025-02-08T20:17:50.373053+0100","caller":"embed/etcd.go:136","msg":"configuring client listeners","listen-client-urls":["https://127.0.0.1:2379"]}
{"level":"info","ts":"2025-02-08T20:17:50.373160+0100","caller":"embed/etcd.go:311","msg":"starting an etcd server","etcd-version":"3.5.16","git-sha":"HEAD","go-version":"go1.22.10","go-os":"linux","go-arch":"amd64","max-cpu-set":16,"max-cpu-available":16,"member-initialized":true,"name":"dami-aspire-a715-42g-7bc7c644","data-dir":"/var/lib/rancher/k3s/server/db/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/rancher/k3s/server/db/etcd/member","force-new-cluster":true,"heartbeat-interval":"500ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://192.168.1.154:2380"],"listen-peer-urls":["https://127.0.0.1:2380"],"advertise-client-urls":["https://127.0.0.1:2379"],"listen-client-urls":["https://127.0.0.1:2379"],"listen-metrics-urls":["http://127.0.0.1:2381"],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":true,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
{"level":"info","ts":"2025-02-08T20:17:50.382993+0100","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/rancher/k3s/server/db/etcd/member/snap/db","took":"9.594239ms"}
{"level":"info","ts":"2025-02-08T20:17:50.383578+0100","caller":"etcdserver/server.go:532","msg":"No snapshot found. Recovering WAL from scratch!"}
{"level":"info","ts":"2025-02-08T20:17:50.383838+0100","caller":"etcdserver/raft.go:613","msg":"forcing restart member","cluster-id":"b8167d82ef8e5588","local-member-id":"71b92086886d74b4","commit-index":8}
{"level":"info","ts":"2025-02-08T20:17:50.383879+0100","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"71b92086886d74b4 switched to configuration voters=()"}
{"level":"info","ts":"2025-02-08T20:17:50.383927+0100","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"71b92086886d74b4 became follower at term 2"}
{"level":"info","ts":"2025-02-08T20:17:50.383939+0100","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"newRaft 71b92086886d74b4 [peers: [], term: 2, commit: 8, applied: 0, lastindex: 8, lastterm: 2]"}
{"level":"warn","ts":"2025-02-08T20:17:50.384748+0100","caller":"auth/store.go:1241","msg":"simple token is not cryptographically signed"}
{"level":"info","ts":"2025-02-08T20:17:50.385279+0100","caller":"mvcc/kvstore.go:423","msg":"kvstore restored","current-rev":3}
{"level":"info","ts":"2025-02-08T20:17:50.385953+0100","caller":"etcdserver/quota.go:94","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
{"level":"info","ts":"2025-02-08T20:17:50.386809+0100","caller":"etcdserver/corrupt.go:96","msg":"starting initial corruption check","local-member-id":"71b92086886d74b4","timeout":"15s"}
{"level":"info","ts":"2025-02-08T20:17:50.386842+0100","caller":"etcdserver/corrupt.go:177","msg":"initial corruption checking passed; no corruption","local-member-id":"71b92086886d74b4"}
{"level":"info","ts":"2025-02-08T20:17:50.386864+0100","caller":"etcdserver/server.go:873","msg":"starting etcd server","local-member-id":"71b92086886d74b4","local-server-version":"3.5.16","cluster-version":"to_be_decided"}
{"level":"info","ts":"2025-02-08T20:17:50.386991+0100","caller":"etcdserver/server.go:773","msg":"starting initial election tick advance","election-ticks":10}
{"level":"info","ts":"2025-02-08T20:17:50.387164+0100","caller":"v3rpc/health.go:61","msg":"grpc service status changed","service":"","status":"SERVING"}
{"level":"info","ts":"2025-02-08T20:17:50.387054+0100","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/var/lib/rancher/k3s/server/db/etcd/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
{"level":"info","ts":"2025-02-08T20:17:50.387209+0100","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/var/lib/rancher/k3s/server/db/etcd/member/snap","suffix":"snap","max":5,"interval":"30s"}
{"level":"info","ts":"2025-02-08T20:17:50.387236+0100","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/var/lib/rancher/k3s/server/db/etcd/member/wal","suffix":"wal","max":5,"interval":"30s"}
{"level":"info","ts":"2025-02-08T20:17:50.387269+0100","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"71b92086886d74b4 switched to configuration voters=(8194616759163909300)"}
{"level":"info","ts":"2025-02-08T20:17:50.387318+0100","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"b8167d82ef8e5588","local-member-id":"71b92086886d74b4","added-peer-id":"71b92086886d74b4","added-peer-peer-urls":["https://192.168.1.154:2380"]}
{"level":"info","ts":"2025-02-08T20:17:50.387390+0100","caller":"membership/cluster.go:584","msg":"set initial cluster version","cluster-id":"b8167d82ef8e5588","local-member-id":"71b92086886d74b4","cluster-version":"3.5"}
{"level":"info","ts":"2025-02-08T20:17:50.387434+0100","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.5"}
{"level":"info","ts":"2025-02-08T20:17:50.390232+0100","caller":"embed/etcd.go:729","msg":"starting with client TLS","tls-info":"cert = /var/lib/rancher/k3s/server/tls/etcd/server-client.crt, key = /var/lib/rancher/k3s/server/tls/etcd/server-client.key, client-cert=, client-key=, trusted-ca = /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
{"level":"info","ts":"2025-02-08T20:17:50.390296+0100","caller":"embed/etcd.go:600","msg":"serving peer traffic","address":"127.0.0.1:2380"}
{"level":"info","ts":"2025-02-08T20:17:50.390338+0100","caller":"embed/etcd.go:572","msg":"cmux::serve","address":"127.0.0.1:2380"}
{"level":"info","ts":"2025-02-08T20:17:50.390504+0100","caller":"embed/etcd.go:280","msg":"now serving peer/client/metrics","local-member-id":"71b92086886d74b4","initial-advertise-peer-urls":["https://192.168.1.154:2380"],"listen-peer-urls":["https://127.0.0.1:2380"],"advertise-client-urls":["https://127.0.0.1:2379"],"listen-client-urls":["https://127.0.0.1:2379"],"listen-metrics-urls":["http://127.0.0.1:2381"]}
{"level":"info","ts":"2025-02-08T20:17:50.390587+0100","caller":"embed/etcd.go:871","msg":"serving metrics","address":"http://127.0.0.1:2381"}
ERRO[0000] Sending HTTP/2.0 503 response to 10.42.0.14:40216: starting 
INFO[0000] Server node token is available at /var/lib/rancher/k3s/server/token 
INFO[0000] To join server node to cluster: k3s server -s https://192.168.1.154:6443 -t ${SERVER_NODE_TOKEN} 
INFO[0000] Agent node token is available at /var/lib/rancher/k3s/server/agent-token 
INFO[0000] To join agent node to cluster: k3s agent -s https://192.168.1.154:6443 -t ${AGENT_NODE_TOKEN} 
INFO[0000] Wrote kubeconfig /etc/rancher/k3s/k3s.yaml   
INFO[0000] Run: k3s kubectl                             
INFO[0000] Updated load balancer k3s-agent-load-balancer default server: 127.0.0.1:6443 
INFO[0000] Running load balancer k3s-agent-load-balancer 127.0.0.1:6444 -> [] [default: 127.0.0.1:6443] 
INFO[0000] Password verified locally for node dami-aspire-a715-42g 
INFO[0000] certificate CN=dami-aspire-a715-42g signed by CN=k3s-server-ca@1738951501: notBefore=2025-02-07 18:05:01 +0000 UTC notAfter=2026-02-08 19:17:51 +0000 UTC 
INFO[0001] certificate CN=system:node:dami-aspire-a715-42g,O=system:nodes signed by CN=k3s-client-ca@1738951501: notBefore=2025-02-07 18:05:01 +0000 UTC notAfter=2026-02-08 19:17:51 +0000 UTC 
INFO[0001] Handling backend connection request [nano99] 
INFO[0001] certificate CN=system:kube-proxy signed by CN=k3s-client-ca@1738951501: notBefore=2025-02-07 18:05:01 +0000 UTC notAfter=2026-02-08 19:17:51 +0000 UTC 
INFO[0001] certificate CN=system:k3s-controller signed by CN=k3s-client-ca@1738951501: notBefore=2025-02-07 18:05:01 +0000 UTC notAfter=2026-02-08 19:17:51 +0000 UTC 
WARN[0001] Host resolv.conf includes loopback or multicast nameservers - kubelet will use autogenerated resolv.conf with nameserver 8.8.8.8 
WARN[0005] Failed to get apiserver address from etcd: context deadline exceeded 
{"level":"info","ts":"2025-02-08T20:17:56.884103+0100","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"71b92086886d74b4 is starting a new election at term 2"}
{"level":"info","ts":"2025-02-08T20:17:56.884159+0100","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"71b92086886d74b4 became pre-candidate at term 2"}
{"level":"info","ts":"2025-02-08T20:17:56.884176+0100","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"71b92086886d74b4 received MsgPreVoteResp from 71b92086886d74b4 at term 2"}
{"level":"info","ts":"2025-02-08T20:17:56.884206+0100","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"71b92086886d74b4 became candidate at term 3"}
{"level":"info","ts":"2025-02-08T20:17:56.884216+0100","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"71b92086886d74b4 received MsgVoteResp from 71b92086886d74b4 at term 3"}
{"level":"info","ts":"2025-02-08T20:17:56.884242+0100","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"71b92086886d74b4 became leader at term 3"}
{"level":"info","ts":"2025-02-08T20:17:56.884253+0100","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: 71b92086886d74b4 elected leader 71b92086886d74b4 at term 3"}
{"level":"info","ts":"2025-02-08T20:17:56.894765+0100","caller":"embed/serve.go:103","msg":"ready to serve client requests"}
{"level":"info","ts":"2025-02-08T20:17:56.894769+0100","caller":"etcdserver/server.go:2142","msg":"published local member to cluster through raft","local-member-id":"71b92086886d74b4","local-member-attributes":"{Name:dami-aspire-a715-42g-7bc7c644 ClientURLs:[https://127.0.0.1:2379]}","request-path":"/0/members/71b92086886d74b4/attributes","cluster-id":"b8167d82ef8e5588","publish-timeout":"15s"}
{"level":"info","ts":"2025-02-08T20:17:56.894819+0100","caller":"embed/serve.go:103","msg":"ready to serve client requests"}
{"level":"info","ts":"2025-02-08T20:17:56.895414+0100","caller":"v3rpc/health.go:61","msg":"grpc service status changed","service":"","status":"SERVING"}
{"level":"info","ts":"2025-02-08T20:17:56.895453+0100","caller":"embed/serve.go:250","msg":"serving client traffic securely","traffic":"http","address":"127.0.0.1:2382"}
{"level":"info","ts":"2025-02-08T20:17:56.895781+0100","caller":"embed/serve.go:250","msg":"serving client traffic securely","traffic":"grpc","address":"127.0.0.1:2379"}
INFO[0006] Connected to etcd v3.5.16 - datastore using 28672 of 45056 bytes 
INFO[0006] Defragmenting etcd database                  
INFO[0006] Connected to etcd v3.5.16 - datastore using 28672 of 45056 bytes 
INFO[0006] Defragmenting etcd database                  
{"level":"info","ts":"2025-02-08T20:17:56.899238+0100","caller":"v3rpc/maintenance.go:92","msg":"starting defragment"}
{"level":"info","ts":"2025-02-08T20:17:56.899241+0100","caller":"v3rpc/maintenance.go:92","msg":"starting defragment"}
{"level":"info","ts":"2025-02-08T20:17:56.901449+0100","caller":"backend/backend.go:509","msg":"defragmenting","path":"/var/lib/rancher/k3s/server/db/etcd/member/snap/db","current-db-size-bytes":45056,"current-db-size":"45 kB","current-db-size-in-use-bytes":28672,"current-db-size-in-use":"29 kB"}
{"level":"info","ts":"2025-02-08T20:17:56.907091+0100","caller":"backend/backend.go:561","msg":"finished defragmenting directory","path":"/var/lib/rancher/k3s/server/db/etcd/member/snap/db","current-db-size-bytes-diff":-12288,"current-db-size-bytes":32768,"current-db-size":"33 kB","current-db-size-in-use-bytes-diff":-4096,"current-db-size-in-use-bytes":24576,"current-db-size-in-use":"25 kB","took":"7.744396ms"}
{"level":"info","ts":"2025-02-08T20:17:56.907141+0100","caller":"v3rpc/maintenance.go:100","msg":"finished defragment"}
{"level":"info","ts":"2025-02-08T20:17:56.907173+0100","caller":"v3rpc/health.go:61","msg":"grpc service status changed","service":"","status":"SERVING"}
{"level":"info","ts":"2025-02-08T20:17:56.909026+0100","caller":"backend/backend.go:509","msg":"defragmenting","path":"/var/lib/rancher/k3s/server/db/etcd/member/snap/db","current-db-size-bytes":32768,"current-db-size":"33 kB","current-db-size-in-use-bytes":24576,"current-db-size-in-use":"25 kB"}
{"level":"info","ts":"2025-02-08T20:17:56.910952+0100","caller":"backend/backend.go:561","msg":"finished defragmenting directory","path":"/var/lib/rancher/k3s/server/db/etcd/member/snap/db","current-db-size-bytes-diff":0,"current-db-size-bytes":32768,"current-db-size":"33 kB","current-db-size-in-use-bytes-diff":0,"current-db-size-in-use-bytes":24576,"current-db-size-in-use":"25 kB","took":"11.501633ms"}
{"level":"info","ts":"2025-02-08T20:17:56.910980+0100","caller":"v3rpc/maintenance.go:100","msg":"finished defragment"}
{"level":"info","ts":"2025-02-08T20:17:56.910995+0100","caller":"v3rpc/health.go:61","msg":"grpc service status changed","service":"","status":"SERVING"}
INFO[0006] Datastore using 24576 of 32768 bytes after defragment 
INFO[0006] Datastore using 24576 of 32768 bytes after defragment 
INFO[0006] etcd data store connection OK                
INFO[0006] ETCD server is now running                   
INFO[0006] k3s is up and running                        
INFO[0006] Waiting for API server to become available   
INFO[0006] Saving cluster bootstrap data to datastore   
INFO[0006] Reconciling bootstrap data between datastore and disk 
WARN[0006] Bootstrap key already exists                 
INFO[0006] Cluster reset: backing up certificates directory to /var/lib/rancher/k3s/server/tls-1739042276 
WARN[0006] Updating bootstrap data on disk from datastore 
INFO[0006] stopping etcd                                
{"level":"info","ts":"2025-02-08T20:17:56.932650+0100","caller":"embed/etcd.go:378","msg":"closing etcd server","name":"dami-aspire-a715-42g-7bc7c644","data-dir":"/var/lib/rancher/k3s/server/db/etcd","advertise-peer-urls":["https://192.168.1.154:2380"],"advertise-client-urls":["https://127.0.0.1:2379"]}
{"level":"info","ts":"2025-02-08T20:17:56.933383+0100","caller":"etcdserver/server.go:1543","msg":"skipped leadership transfer for single voting member cluster","local-member-id":"71b92086886d74b4","current-leader-member-id":"71b92086886d74b4"}
{"level":"info","ts":"2025-02-08T20:17:56.935672+0100","caller":"embed/etcd.go:582","msg":"stopping serving peer traffic","address":"127.0.0.1:2380"}
{"level":"info","ts":"2025-02-08T20:17:56.935789+0100","caller":"embed/etcd.go:587","msg":"stopped serving peer traffic","address":"127.0.0.1:2380"}
{"level":"info","ts":"2025-02-08T20:17:56.935806+0100","caller":"embed/etcd.go:380","msg":"closed etcd server","name":"dami-aspire-a715-42g-7bc7c644","data-dir":"/var/lib/rancher/k3s/server/db/etcd","advertise-peer-urls":["https://192.168.1.154:2380"],"advertise-client-urls":["https://127.0.0.1:2379"]}
WARN[0010] Failed to get apiserver address from etcd: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused" 
INFO[0011] Managed etcd cluster membership has been reset, restart without --cluster-reset flag now. Backup and delete ${datadir}/server/db on each peer etcd server and rejoin the

Do you have any further recommendations?

0 replies

DamianoSamperi · 2025-02-10T18:13:42Z

DamianoSamperi
Feb 10, 2025
Author

Hi all,

I’m providing an update on the previously reported issue regarding the routing problem between the pods and services in my Kubernetes cluster.

Summary of the Issue:

As mentioned before, communication between pods works fine (a worker pod can communicate with a master pod), but communication between a worker pod and the myservice service (IP: 10.43.109.4) is failing.Hi all,I’m providing an update on the previously reported issue regarding the routing problem between the pods and services in my Kubernetes cluster.Summary of the Issue:As mentioned before, communication between pods works fine (a worker pod can communicate with a master pod), but communication between a worker pod and the myservice service (IP: 10.43.109.4) is failing.

Current Observations:

Pods in the cluster:

 kubectl get pods -o wide 
 NAME         READY   STATUS    RESTARTS   AGE   IP           NODE                   NOMINATED NODE   READINESS GATES 
 master-pod   1/1     Running   0          78m   10.42.0.10   dami-aspire-a715-42g   <none>           <none> 
 worker-pod   1/1     Running   0          16m   10.42.1.8    nano99                 <none>           <none>

Services in the cluster:

kubectl get svc -o wide 
NAME         TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE     SELECTOR
kubernetes   ClusterIP   10.43.0.1     <none>        443/TCP   3h32m   <none> 
myservice    ClusterIP   10.43.109.4   <none>        90/TCP    36m     app=myapp

Traceroute from the worker-pod to the myservice service (IP: 10.43.109.4):

kubectl exec -it worker-pod -- /bin/sh
# traceroute 10.43.109.4
traceroute to 10.43.109.4 (10.43.109.4), 30 hops max, 60 byte packets
  1  10.42.1.1 (10.42.1.1)  2.756 ms  2.348 ms  2.246 ms
  2  192.168.1.1 (192.168.1.1)  6.373 ms  6.334 ms  6.261 ms
  3  * * *
  4  * * *
  5  * * *
  6  * * *
  7  * * *
  8  *^C

As observed, the traffic reaches the 192.168.1.1 router but is not routed properly to the service.

Traceroute from the worker-pod to another pod in the master node (IP: 10.42.0.10):

# traceroute 10.42.0.10
traceroute to 10.42.0.10 (10.42.0.10), 30 hops max, 60 byte packets
  1  10.42.1.1 (10.42.1.1)  0.321 ms  0.120 ms  0.087 ms
  2  10.42.0.0 (10.42.0.0)  2.265 ms  3.261 ms  3.148 ms
  3  10-42-0-10.myservice.default.svc.cluster.local (10.42.0.10)  3.009 ms  2.822 ms  3.377 ms

The traffic correctly reaches the destination pod in this case.

Current routing table in the worker-pod:

ip route
default via 192.168.1.1 dev eth0 proto dhcp metric 100 
10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink 
10.42.1.0/24 dev cni0 proto kernel scope link src 10.42.1.1 
169.254.0.0/16 dev eth0 scope link metric 1000 
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.119 metric 100

Actions Taken So Far:

I have added an explicit route for the 10.43.0.0/24 network in the worker-pod using:

sudo ip route add [10.43.0.0/24](http://10.43.0.0/24) dev cni0

However, the issue persists and traffic towards the service is still not routed correctly.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods on Worker Node Can't Communicating with API Server or Service in general #11728

{{title}}

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Pods on Worker Node Can't Communicating with API Server or Service in general #11728

DamianoSamperi Feb 7, 2025

Replies: 3 comments

brandond Feb 7, 2025 Collaborator

DamianoSamperi Feb 8, 2025 Author

DamianoSamperi Feb 10, 2025 Author

Summary of the Issue:

DamianoSamperi
Feb 7, 2025

brandond
Feb 7, 2025
Collaborator

DamianoSamperi
Feb 8, 2025
Author

DamianoSamperi
Feb 10, 2025
Author