Expose kube-scheduler, kube-proxy and kube-controller metrics endpoints #3619

onedr0p · 2021-07-12T17:30:37Z

Is your feature request related to a problem? Please describe.

Unable to monitor the following components using kube-prometheus-stack:

kube-scheduler
kube-proxy
kube-controller

Describe the solution you'd like

Add configuration options like in PR #2750 for each component so they are not only being bound to 127.0.0.1

E.g.

k3s server \
--kube-controller-expose-metrics true
--kube-proxy-expose-metrics true
--kube-scheduler-expose-metrics true

In kube-prometheus-stack configuration all you have to do is configure the following:

kubeControllerManager:
  enabled: true
  endpoints:
  - 192.168.42.10
  - 192.168.42.11
  - 192.168.42.12
  service:
    enabled: true
    port: 10252
    targetPort: 10252
kubeScheduler:
  enabled: true
  endpoints:
  - 192.168.42.10
  - 192.168.42.11
  - 192.168.42.12
  service:
    enabled: true
    port: 10252
    targetPort: 10252
kubeProxy:
  enabled: true
  endpoints:
  - 192.168.42.10
  - 192.168.42.11
  - 192.168.42.12
  service:
    enabled: true
    port: 10252
    targetPort: 10252

Describe alternatives you've considered

Deploying rancher-pushprox to get these metrics exposed but it's not very easy to do, or user-friendly

Additional context

I am willing to give a shot at opening a PR as it should be pretty close to #2750

Related to #425

The text was updated successfully, but these errors were encountered:

brandond · 2021-07-12T17:43:01Z

That was necessary because we don't have a facility to pass CLI flags through to the embedded etcd. For Kubernetes components, you can already just do something like:
--kube-controller-manager-arg=address=10.0.1.2 --kube-controller-manager-arg=bind-address=10.0.1.2

onedr0p · 2021-07-12T17:50:54Z

@brandond I am probably mis-reading the code here but it looks like it hardcoded to 127.0.0.1

k3s/pkg/daemons/control/server.go

Lines 134 to 135 in 238dc20

    
           "address":      "127.0.0.1", 
        
           "bind-address": "127.0.0.1",

Will setting the options you described override this?

brandond · 2021-07-12T17:52:25Z

Yes, if you look a few lines down you can see where the user-provided args are used to update to the args map when flattening the map into the args slice. Since the user args come last, they are preferred over the defaults we provide.

k3s/pkg/daemons/control/server.go

Line 142 in 238dc20

args := config.GetArgsList(argsMap, cfg.ExtraSchedulerAPIArgs)

onedr0p · 2021-07-12T18:32:41Z

Thanks @brandond 🙏🏼

kube-prometheus-stack helm values

kubeApiServer:
  enabled: true
kubeControllerManager:
  enabled: true
  endpoints:
  - 192.168.42.10
  - 192.168.42.11
  - 192.168.42.12
kubeScheduler:
  enabled: true
  endpoints:
  - 192.168.42.10
  - 192.168.42.11
  - 192.168.42.12
kubeProxy:
  enabled: true
  endpoints:
  - 192.168.42.10
  - 192.168.42.11
  - 192.168.42.12
kubeEtcd:
  enabled: true
  endpoints:
  - 192.168.42.10
  - 192.168.42.11
  - 192.168.42.12
  service:
    enabled: true
    port: 2381
    targetPort: 2381

k3s controllers settings

kube-controller-manager-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
etcd-expose-metrics: true

I can also verify Grafana dashboard are populated :D

rlex · 2021-11-18T19:29:43Z

For anyone stumbling upon same issue (because it pops on google first search page)
In case of v1.22 you will also need to add

          serviceMonitor:
            enabled: true
            https: true
            insecureSkipVerify: true

to both kubeControllerManager and kubeScheduler because now it forces https. Also, ports have changed, so my config looks like:

      - kubeControllerManager:
          enabled: true
          endpoints:
            - 172.25.25.61
            - 172.25.25.62
            - 172.25.25.63
          service:
            enabled: true
            port: 10257
            targetPort: 10257
          serviceMonitor:
            enabled: true
            https: true
            insecureSkipVerify: true
      - kubeScheduler:
          enabled: true
          endpoints:
            - 172.25.25.61
            - 172.25.25.62
            - 172.25.25.63
          service:
            enabled: true
            port: 10259
            targetPort: 10259
          serviceMonitor:
            enabled: true
            https: true
            insecureSkipVerify: true

Additionally, "address=0.0.0.0" can be dropped because it's deprecated now, see https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/

Verified on kube-prometheus-stack 20.0.1 and k3s 1.22.3

untcha · 2021-12-14T20:46:22Z

@onedr0p: how do I exactly set the k3s controller settings on my master nodes? Not during installation but in an running environment. With the k3s config.yaml?

from your and @rlex comments I understand that the configuration needs to look like this, correct?:

kube-controller-manager-arg:
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "bind-address=0.0.0.0"
etcd-expose-metrics: true

onedr0p · 2021-12-14T21:10:56Z

Yes but I'm k3s 1.22 some defaults in kube prometheus stack need to be changed:

onedr0p/home-ops#2378

untcha · 2021-12-14T22:12:43Z

The changes in the kube prometheus stack seem to be clear to me. I am /was struggling with the k3s config.
Is it enough to create the config.yaml with just the above configuration, or do I need to restart k3s on each master, or the node itself?

onedr0p · 2021-12-14T22:36:33Z

Depends on how you installed k3s, you need to tell k3s to look for the config.yaml

brandond · 2021-12-14T22:51:45Z

You don't need to tell k3s to look for config.yaml if you place it at /etc/rancher/k3s/config.yaml. A restart is required to make any changes, regardless of whether you use CLI flags, or a config file, or both.

untcha · 2021-12-18T14:22:21Z

Worked like charm. Thank you guys!

kladiv · 2022-01-14T18:42:13Z

@onedr0p does this solution hit the problem explained in the issue below?

Duplicate metrics seem to be emitted #2262

Thank you

Jojoooo1 · 2022-01-17T22:09:31Z

Event after setting the kubeEtcd with server config and helm chart as defined few messages earlier seems to not work correctly.

kladiv · 2022-01-17T22:46:24Z

@Jojoooo1 did you setup single-node or multi-node cluster?
In single-node setup there's no etcd.

Jojoooo1 · 2022-01-18T00:06:58Z

Thanks! I actually had a single node!

rlex · 2022-01-18T00:39:17Z

strictly speaking, k3s single-node can have etcd, but only if you added cluster-init parameter to k3s args / config / env

rthamrin · 2022-01-27T14:45:22Z

where can I edit this file, for my k3s controller manager?

k3s controllers settings
kube-controller-manager-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
etcd-expose-metrics: true

george-kirillov · 2022-01-27T17:26:42Z

where can I edit this file, for my k3s controller manager?

k3s controllers settings
kube-controller-manager-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
etcd-expose-metrics: true

https://rancher.com/docs/k3s/latest/en/installation/install-options/#configuration-file


etcd-expose-metrics: true
kube-controller-manager-arg:
- bind-address=0.0.0.0
kube-proxy-arg:
- metrics-bind-address=0.0.0.0
kube-scheduler-arg:
- bind-address=0.0.0.0

rthamrin · 2022-01-27T22:25:37Z

where can I edit this file, for my k3s controller manager?

k3s controllers settings
kube-controller-manager-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
etcd-expose-metrics: true

https://rancher.com/docs/k3s/latest/en/installation/install-options/#configuration-file


etcd-expose-metrics: true
kube-controller-manager-arg:
- bind-address=0.0.0.0
kube-proxy-arg:
- metrics-bind-address=0.0.0.0
kube-scheduler-arg:
- bind-address=0.0.0.0

thanks man, anyway I don't have that file config, but i only have /etc/rancher/k3s/k3s.yaml and /etc/systemd/system/k3s.sservice and some files in /var/lib/rancher/k3s

Is it possible to create it manually or do I need to upgrade my K3S?
nb: my current version ( v1.21.7)

rlex · 2022-01-27T23:50:12Z

@rthamrin config file is not installed by default, you need to manually create it.

enabled: kubeControllerManager, kubeScheduler, kubeEtcd see: onedr0p/home-ops#2378 k3s-io/k3s#3619

Ref: k3s-io/k3s#3619

macrokernel · 2022-08-12T17:08:35Z

I had the same issue with a K3s cluster, and tried following the above solution, but after adding endpoints to the prometheus operator values file, the helm would fail to deploy with the following error:
Error: UPGRADE FAILED: rendered manifests contain a resource that already exists. Unable to continue with update: Endpoints "prometheus-stack-kube-prom-kube-controller-manager" in namespace "kube-system" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "prometheus-stack"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "monitoring"

I found a working solution at this page: https://picluster.ricsanfre.com/docs/prometheus/#k3s-components-monitoring. Leaving it here in case if anyone runs it to the same issue with the helm chart.

chemicstry · 2023-02-10T14:01:08Z

I spent a couple of days figuring out how to make default kube-prometheus-stack metrics to work with k3s and found a couple of important things that are not mentioned here.

Firstly, k3s exposes all metrics combined (apiserver, kubelet, kube-proxy, kube-scheduler, kube-controller) on each metrics endpoint. The only separate metric is embedded etcd database on port 2831, if you are using it. So if you follow the advice given in this issue and setup scrape jobs for each component separately, you are collecting all metrics duplicated 5 times and wasting prometheus resources.

Now to fix this properly is a bit difficult. All default grafana charts filter data by job name. I.e. the kube-proxy dashboard has job = "kube-proxy" in all queries. So my first attempt was to rename jobs based on metric name. I added this config to helm values:

kubelet:
  serviceMonitor:
    metricRelabelings:
      # k3s exposes all metrics on all endpoints, relabel jobs that belong to other components
      - sourceLabels: [__name__]
        regex: "scheduler_(.+)"
        targetLabel: "job"
        replacement: "kube-scheduler"
      - sourceLabels: [__name__]
        regex: "kubeproxy_(.+)"
        targetLabel: "job"
        replacement: "kube-proxy"

This simply sets job label to kube-scheduler for all metrics that start with scheduler_, and to kube-proxy for all metrics that start with kubeproxy_.

But there is another problem. Instance variable in grafana charts uses up metric to find all instances of the component (label_values(up{job="kube-scheduler", cluster="$cluster"}, instance)). You can't rename job of up metric or other, in this case kubelet, dashboards will stop working. I couldn't find a way to multiply metrics with prometheus rules to create up metric for each job. So the only way is to edit grafana dashboards and change variable queries to (label_values(up{job="kubelet", cluster="$cluster"}, instance)).

There are also other metrics, which are shared between components such as rest_client_requests_total, which are global to entire k3s-server (all components) and do not make sense in single-component dashboards.

Also keep in mind that default kube-prometheus-stack configuration already duplicates data 2 times by collecting kubeApiServer and kubelet metrics, which are the same. It is best to disable only kubeApiServer, which collects data only from master nodes, while kubelet collects from both master and agent nodes. Disabling kubeApiServer automatically removes apiserver alerts and grafana dashboard so you have to re-import these manually.

It is really unfortunate that k3s makes it so complicated to use kube-prometheus-stack. If someone has a better solution to make this work without duplicating data, please share.

onedr0p · 2023-03-24T20:12:24Z

@chemicstry Thanks for the details, I suppose we could re-open this issue but I am not sure if it is something the k3s maintainers are willing to "fix". Ideally this should all work out of the box with the kube-prometheus-stack helm chart.

@brandond any comment on this?

brandond · 2023-03-24T22:38:01Z

There isn't really anything we can fix on our side. The prometheus go libraries use a global metrics configuration, so any metrics registered by any component in a process are exposed by all metrics listeners. There's no way to bind specific metrics to a specific metrics endpoint, when they're all running in the same process. A core efficiency of K3s is that we run all the components in the same process, and we're not planning on changing that.

mrclrchtr · 2024-03-10T22:03:04Z

Has anyone here found a good solution?

What do you think about this? portefaix/portefaix-kubernetes#4682 portefaix/portefaix-kubernetes@dc767bd#diff-725c569b96f4a66ed07e1a4d1a5d8d24b3a500f1a1dae5b80444a2109ce94c17

mrclrchtr · 2024-03-12T12:59:47Z

If others come here by googling, I have found what seems to be a good solution.
The solution also addresses two other issues with dashboards.

prometheus:
  serviceMonitor:
    # fix for https://github.com/prometheus-community/helm-charts/issues/4221
    relabelings:
      - action: replace
        targetLabel: cluster
        replacement: yourClusterNameHere

# fix for https://github.com/prometheus-community/helm-charts/issues/3800
grafana:
  serviceMonitor:
    labels:
      release: kube-prometheus-stack

kubeApiServer:
  serviceMonitor:
    metricRelabelings:
      - action: drop
        regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: "etcd_request_duration_seconds_bucket"
        sourceLabels: [__name__]
      - action: drop
        regex: (scheduler_plugin_execution_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: (workqueue_work_duration_seconds_bucket)
        sourceLabels: [__name__]

kubelet:
  serviceMonitor:
    cAdvisorRelabelings:
      - action: replace
        sourceLabels: [__metrics_path__]
        targetLabel: metrics_path
      - action: replace
        targetLabel: instance
        sourceLabels:
          - "node"
    relabelings:
      - action: replace
        sourceLabels: [__metrics_path__]
        targetLabel: metrics_path
    metricRelabelings:
      - action: drop
        regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: "etcd_request_duration_seconds_bucket"
        sourceLabels: [__name__]
      - action: drop
        regex: (scheduler_plugin_execution_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: (workqueue_work_duration_seconds_bucket)
        sourceLabels: [__name__]

kubeControllerManager:
  # Add all Control Plane IPs
  endpoints:
    - 10.255.0.101
    - 10.255.0.102
    - 10.255.0.103
  service:
    enabled: true
    port: 10257
    targetPort: 10257
  serviceMonitor:
    https: true
    insecureSkipVerify: true
    metricRelabelings:
      - action: drop
        regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: "etcd_request_duration_seconds_bucket"
        sourceLabels: [__name__]
      - action: drop
        regex: (scheduler_plugin_execution_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: (workqueue_work_duration_seconds_bucket)
        sourceLabels: [__name__]

kubeEtcd:
  # Add all Control Plane IPs
  endpoints:
    - 10.255.0.101
    - 10.255.0.102
    - 10.255.0.103
  service:
    enabled: true
    port: 2381
    targetPort: 2381
  serviceMonitor:
    metricRelabelings:
      - action: drop
        regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: "etcd_request_duration_seconds_bucket"
        sourceLabels: [__name__]
      - action: drop
        regex: (scheduler_plugin_execution_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: (workqueue_work_duration_seconds_bucket)
        sourceLabels: [__name__]

kubeScheduler:
  # Add all Control Plane IPs
  endpoints:
    - 10.255.0.101
    - 10.255.0.102
    - 10.255.0.103
  service:
    enabled: true
    port: 10259
    targetPort: 10259
  serviceMonitor:
    https: true
    insecureSkipVerify: true
    metricRelabelings:
      - action: drop
        regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: "etcd_request_duration_seconds_bucket"
        sourceLabels: [__name__]
      - action: drop
        regex: (scheduler_plugin_execution_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: (workqueue_work_duration_seconds_bucket)
        sourceLabels: [__name__]

kubeProxy:
  # Add all Control Plane IPs
  endpoints:
    - 10.255.0.101
    - 10.255.0.102
    - 10.255.0.103
  service:
    enabled: true
    port: 10249
    targetPort: 10249
    selector:
      k8s-app: kube-proxy
  serviceMonitor:
    metricRelabelings:
      - action: drop
        regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: "etcd_request_duration_seconds_bucket"
        sourceLabels: [__name__]
      - action: drop
        regex: (scheduler_plugin_execution_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: (workqueue_work_duration_seconds_bucket)
        sourceLabels: [__name__]

coreDns:
  serviceMonitor:
    metricRelabelings:
      - action: drop
        regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: "etcd_request_duration_seconds_bucket"
        sourceLabels: [__name__]
      - action: drop
        regex: (scheduler_plugin_execution_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: (workqueue_work_duration_seconds_bucket)
        sourceLabels: [__name__]

kubeDns:
  serviceMonitor:
    metricRelabelings:
      - action: drop
        regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: "etcd_request_duration_seconds_bucket"
        sourceLabels: [__name__]
      - action: drop
        regex: (scheduler_plugin_execution_duration_seconds_bucket)
        sourceLabels: [__name__]
      - action: drop
        regex: (workqueue_work_duration_seconds_bucket)
        sourceLabels: [__name__]

kube-state-metrics:
  prometheus:
    monitor:
      relabelings:
        - action: replace
          targetLabel: "instance"
          sourceLabels:
            - "__meta_kubernetes_pod_node_name"

prometheus-node-exporter:
  prometheus:
    monitor:
      relabelings:
        - action: replace
          targetLabel: "instance"
          sourceLabels:
            - "__meta_kubernetes_pod_node_name"

onedr0p · 2024-03-12T13:28:07Z

@mrclrchtr if I understand this correctly you are still going to have duplicate metrics, this can lead to absolutely insane high memory usage with Prometheus. I recently switch my cluster from k3s to Talos and saw 2-3GB less usage in memory per Prometheus instance since Talos exports these metrics the "standard" way.

The best method I found was to do the analysis across what needs to be kept on each component and write relabelings based upon that research. For example https://github.com/onedr0p/home-ops/blob/e6716b476ff1432ddbbb7d4efa0e10d0ac4e9a66/kubernetes/storage/apps/observability/kube-prometheus-stack/app/helmrelease.yaml However this isn't perfect as well and will not dedupe all metric labels across the components, it's prone to error and when updates to Kubernetes happen won't capture any new metrics being emitted. FWIW even with these relabelings I was still seeing a 3-4GB RAM usage per prometheus instance with them applied.

I would love for k3s to support a native way to handle this with the kube-prometheus-stack as it's my major pain point with it, not obvious until one discovers this issue and one of the major reasons I am exploring other options like Talos. 😢

mrclrchtr · 2024-03-12T13:35:56Z

Damn... I was hoping this would be the solution... I'll probably have to look for alternatives too... I've invested way too much time in this already....

But thanks for the info! I'll have a look at Talos too.

k3s-io/k3s#3619

denniseffing · 2025-01-16T18:59:22Z

There isn't really anything we can fix on our side. The prometheus go libraries use a global metrics configuration, so any metrics registered by any component in a process are exposed by all metrics listeners. There's no way to bind specific metrics to a specific metrics endpoint, when they're all running in the same process.

@brandond I've never worked with the Prometheus Go library but reading some documentation and issues has left me quite confused. Are you 100% certain that the library does not allow binding specific metrics to a specific metrics endpoint?

Here are some issues/PRs regarding this issue:

Apparently, this has already been resolved in 2016.
The following example explicitly mentions that it's possible to create non-global registries and bind them to their own HTTP endpoint respectively.

https://github.com/prometheus/client_golang/blob/main/examples/simple/main.go#L30-L45

@onedr0p Continuing our short discussion in prometheus-community/helm-charts#2865 (comment) I agree that this would be the real fix for k3s & kube-prometheus-stack compatibility.

brandond · 2025-01-16T20:22:26Z

I don't think you're following me. Like I said, we don't modify the upstream Kubernetes code to the extent that this is possible. We just call the main entrypoints:

k3s/pkg/daemons/executor/embed.go

Lines 123 to 125 in 08c30f5

    
           func (*Embedded) APIServer(ctx context.Context, etcdReady <-chan struct{}, args []string) error { 
        
           	command := apiapp.NewAPIServerCommand(ctx.Done()) 
        
           	command.SetArgs(args)

k3s/pkg/daemons/executor/embed.go

Lines 167 to 169 in 08c30f5

    
           func (*Embedded) ControllerManager(ctx context.Context, apiReady <-chan struct{}, args []string) error { 
        
           	command := cmapp.NewControllerManagerCommand() 
        
           	command.SetArgs(args)

and so on.

Internally each of these components call the same shared code from k8s.io/component-base/metrics to register metrics: https://github.com/kubernetes/kubernetes/blob/v1.32.1/staging/src/k8s.io/component-base/metrics/legacyregistry/registry.go

Normally this is fine because they run in separate processes with their own separate per-process metrics endpoints.

Because the high-level cli entrypoints don't allow passing in metrics registerer/gatherers (and why should they, these are just CLI entrypoints) we cannot override them per component.

denniseffing · 2025-01-16T21:07:57Z

Thank you for clarifying, that makes sense. Sorry for the naive questions, I guess I kind of knew that I am missing something, but didn't exactly know where to look.

nickjanssen · 2025-01-30T00:43:47Z

So what is the conclusion? It's not k3s, but prometheus's fault?

A core efficiency of K3s is that we run all the components in the same process, and we're not planning on changing that.

I have no choice but to move to Talos because of this. The workarounds above are terrible. What use is a cluster if I can't monitor it properly?

brandond · 2025-01-30T00:49:49Z

What exactly is so onerous about this? The metrics for everything are all in one place BECAUSE EVERYTHING RUNS IN ONE PLACE. This is the core efficiency of K3s.

If you scrape metrics from multiple endpoints you're going to get multiple copies of the metrics. That doesn't seem SUPER hard to deal with. Why is your monitoring configuration so inflexible that you can't either disable some of the unnecessary duplicate scraping, or add a relabel config to drop the duplicate metrics? It's not rocket science, there are untold thousands of organizations successfully using and monitoring K3s without issue. The Rancher monitoring charts handle it properly out of the box.

nickjanssen · 2025-01-30T01:17:17Z

Thank you for your reply. Let me start by saying that I appreciate the work that you put into k3s. With that said:

I recently switch my cluster from k3s to Talos and saw 2-3GB less usage in memory per Prometheus instance

Doesn't this challenge the claim of "core efficiency"? If K3s is meant to be lightweight, but this architecture results in a much higher memory consumption, it would seem like there is room for improvement.

Why is your monitoring configuration so inflexible

I prefer solutions that work out of the box. If I wanted to spend time tweaking configurations, I would have gone with kubeadm and read Kubernetes the Hard Way. A major benefit of K3s is its simplicity, and expecting users to implement workarounds for basic monitoring goes against that philosophy.

That doesn't seem SUPER hard to deal with.

Judging by the number of comments in this thread, it seems like many people find it problematic. If something requires workarounds, it’s usually a sign that there’s an opportunity to improve the experience.

rlex · 2025-01-30T01:50:09Z

Not speaking as k3s dev, but i think the whole point is that k3s is k8s distribution, not k8s fork.
If changes need to be added to core (even tiny ones) that will change k3s to fork, which might pose issues using it in different scenarios.

There is also multiple pretty much copypaste solutions with relabeling for standard tools like kube-prometheus-stack in this thread.

Not everything is out-of-the-box solution.

brandond · 2025-01-30T01:52:33Z

if K3s is meant to be lightweight, but this architecture results in a much higher memory consumption, it would seem like there is room for improvement.

K3s doesn't use more memory. Period. Scraping ANYTHING multiple times with different labels will increase your Prometheus metric cardinality, but that is not something we manage in this project.

I prefer solutions that work out of the box.

K3s works out of the box. It is your monitoring stack that has issues. I would recommend you direct your criticism at whatever chart or operator is managing the over-aggressive scraping. K3s works with monitoring built into Rancher since that is a project we maintain. We do not have the cycles to identify all of the many other tools that people might be using to monitor K3s with, figure out if they have issues, and fix them if so.

As we've made clear multiple times, the problem is NOT how k3s exposes metrics. The k3s metrics accurately represent the components that are being scraped. Problems occur ONLY when you scrape it multiple times with distinct labels, which balloons the metric cardinality.

Judging by the number of comments in this thread, it seems like many people find it problematic.

K3s is open source and readily accepts community contributions. Lots of people use it. Some small subset of them find specific things problematic enough to complain about it. Apparently no one (including you) has found it problematic enough to contribute a solution.

onedr0p mentioned this issue Jul 12, 2021

Best practice prometheus monitoring #425

Closed

onedr0p closed this as completed Jul 12, 2021

onedr0p mentioned this issue Jul 12, 2021

Monitor core k3s components onedr0p/home-ops#1566

Closed

ricsanfre mentioned this issue Dec 3, 2021

Promethus: Unable to monitor kube-scheduler, kube-proxy and kube-contoller-manager components ricsanfre/pi-cluster#22

Closed

sometimeskind pushed a commit to sometimeskind/k3os-init that referenced this issue Dec 5, 2021

--address has maybe been deprecated now k3s-io/k3s#3619

0208520

tyriis added a commit to tyriis/home-ops that referenced this issue Feb 13, 2022

Change: configure kube-prometheus-stack

c45f1cc

enabled: kubeControllerManager, kubeScheduler, kubeEtcd see: onedr0p/home-ops#2378 k3s-io/k3s#3619

samip5 added a commit to samip5/k8s-cluster that referenced this issue Feb 13, 2022

Fix the prom-stack issue.

d947263

Ref: k3s-io/k3s#3619

tuxpeople added a commit to tuxpeople/k8s-homelab that referenced this issue Mar 29, 2022

Maybe we get that working with k3s-io/k3s#3619 (comment)

ef1ba73

tuxpeople added a commit to tuxpeople/k8s-homelab that referenced this issue Mar 29, 2022

adjustments based on k3s-io/k3s#3619 (comment)

918e515

onedr0p reopened this Mar 24, 2023

oscaromeu mentioned this issue Mar 24, 2023

K3S emitting duplicated metrics in all endpoints (Api server, kubelet, kube-proxy, kube-scheduler, etc) oscaromeu/home-ops#99

Closed

caroline-suse-rancher moved this to To Triage in K3s Development Apr 14, 2023

caroline-suse-rancher added this to K3s Development Apr 14, 2023

caroline-suse-rancher closed this as completed Apr 24, 2023

github-project-automation bot moved this from New to Done Issue in K3s Development Apr 24, 2023

batleforc mentioned this issue May 26, 2023

FIX K3S metrics problem batleforc/WeeboGitOps#43

Closed

JefeDavis mentioned this issue May 27, 2023

Fix k3s Core monitoring JefeDavis/k8s-HomeOps#543

Closed

nlamirault mentioned this issue Dec 13, 2023

K3S emitting duplicated metrics in all endpoints (Api server, kubelet, kube-proxy, kube-scheduler, etc) portefaix/portefaix-kubernetes#4682

Closed

matofeder mentioned this issue Mar 19, 2024

Add Kubernetes monitoring play osism/ansible-playbooks#392

Merged

matofeder mentioned this issue Apr 12, 2024

Add support for k3s monitoring SovereignCloudStack/k8s-observability#48

Closed

sholdee added a commit to sholdee/home-ops that referenced this issue Jun 12, 2024

add kubelet settings

0c34747

k3s-io/k3s#3619

abelfodil mentioned this issue Nov 23, 2024

Add prometheus arch-anes/self-hosted-services#16

Open

denniseffing mentioned this issue Jan 16, 2025

[complete stack] Stack is not working by default on K3S clusters, enable it to work out of the box on K3S prometheus-community/helm-charts#2865

Closed

pschiffe mentioned this issue Jan 18, 2025

v2, k3s: kubeControllerManager, kubeScheduler & kubeProxy metrics grafana/k8s-monitoring-helm#1145

Open

Expose kube-scheduler, kube-proxy and kube-controller metrics endpoints #3619

Expose kube-scheduler, kube-proxy and kube-controller metrics endpoints #3619

Comments

onedr0p commented Jul 12, 2021 • edited Loading

brandond commented Jul 12, 2021 • edited Loading

onedr0p commented Jul 12, 2021 • edited Loading

brandond commented Jul 12, 2021 • edited Loading

onedr0p commented Jul 12, 2021 • edited Loading

kube-prometheus-stack helm values

k3s controllers settings

rlex commented Nov 18, 2021

untcha commented Dec 14, 2021

onedr0p commented Dec 14, 2021

untcha commented Dec 14, 2021

onedr0p commented Dec 14, 2021

brandond commented Dec 14, 2021

untcha commented Dec 18, 2021

kladiv commented Jan 14, 2022

Jojoooo1 commented Jan 17, 2022

kladiv commented Jan 17, 2022

Jojoooo1 commented Jan 18, 2022

rlex commented Jan 18, 2022

rthamrin commented Jan 27, 2022

george-kirillov commented Jan 27, 2022 • edited Loading

rthamrin commented Jan 27, 2022 • edited Loading

rlex commented Jan 27, 2022

macrokernel commented Aug 12, 2022

chemicstry commented Feb 10, 2023

onedr0p commented Mar 24, 2023 • edited Loading

brandond commented Mar 24, 2023

mrclrchtr commented Mar 10, 2024

mrclrchtr commented Mar 12, 2024

onedr0p commented Mar 12, 2024 • edited Loading

mrclrchtr commented Mar 12, 2024

denniseffing commented Jan 16, 2025

brandond commented Jan 16, 2025 • edited Loading

denniseffing commented Jan 16, 2025

nickjanssen commented Jan 30, 2025

brandond commented Jan 30, 2025 • edited Loading

nickjanssen commented Jan 30, 2025

rlex commented Jan 30, 2025 • edited Loading

brandond commented Jan 30, 2025 • edited Loading

onedr0p commented Jul 12, 2021 •

edited

Loading

brandond commented Jul 12, 2021 •

edited

Loading

onedr0p commented Jul 12, 2021 •

edited

Loading

brandond commented Jul 12, 2021 •

edited

Loading

onedr0p commented Jul 12, 2021 •

edited

Loading

george-kirillov commented Jan 27, 2022 •

edited

Loading

rthamrin commented Jan 27, 2022 •

edited

Loading

onedr0p commented Mar 24, 2023 •

edited

Loading

onedr0p commented Mar 12, 2024 •

edited

Loading

brandond commented Jan 16, 2025 •

edited

Loading

brandond commented Jan 30, 2025 •

edited

Loading

rlex commented Jan 30, 2025 •

edited

Loading

brandond commented Jan 30, 2025 •

edited

Loading