-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose kube-scheduler, kube-proxy and kube-controller metrics endpoints #3619
Comments
That was necessary because we don't have a facility to pass CLI flags through to the embedded etcd. For Kubernetes components, you can already just do something like: |
@brandond I am probably mis-reading the code here but it looks like it hardcoded to k3s/pkg/daemons/control/server.go Lines 134 to 135 in 238dc20
Will setting the options you described override this? |
Yes, if you look a few lines down you can see where the user-provided args are used to update to the args map when flattening the map into the args slice. Since the user args come last, they are preferred over the defaults we provide. k3s/pkg/daemons/control/server.go Line 142 in 238dc20
|
Thanks @brandond 🙏🏼 kube-prometheus-stack helm valueskubeApiServer:
enabled: true
kubeControllerManager:
enabled: true
endpoints:
- 192.168.42.10
- 192.168.42.11
- 192.168.42.12
kubeScheduler:
enabled: true
endpoints:
- 192.168.42.10
- 192.168.42.11
- 192.168.42.12
kubeProxy:
enabled: true
endpoints:
- 192.168.42.10
- 192.168.42.11
- 192.168.42.12
kubeEtcd:
enabled: true
endpoints:
- 192.168.42.10
- 192.168.42.11
- 192.168.42.12
service:
enabled: true
port: 2381
targetPort: 2381 k3s controllers settingskube-controller-manager-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
etcd-expose-metrics: true I can also verify Grafana dashboard are populated :D |
For anyone stumbling upon same issue (because it pops on google first search page)
to both kubeControllerManager and kubeScheduler because now it forces https. Also, ports have changed, so my config looks like:
Additionally, "address=0.0.0.0" can be dropped because it's deprecated now, see https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/ Verified on kube-prometheus-stack 20.0.1 and k3s 1.22.3 |
@onedr0p: how do I exactly set the k3s controller settings on my master nodes? Not during installation but in an running environment. With the k3s from your and @rlex comments I understand that the configuration needs to look like this, correct?: kube-controller-manager-arg:
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "bind-address=0.0.0.0"
etcd-expose-metrics: true |
Yes but I'm k3s 1.22 some defaults in kube prometheus stack need to be changed: |
The changes in the kube prometheus stack seem to be clear to me. I am /was struggling with the k3s config. |
Depends on how you installed k3s, you need to tell k3s to look for the |
You don't need to tell k3s to look for config.yaml if you place it at |
Worked like charm. Thank you guys! |
@onedr0p does this solution hit the problem explained in the issue below? Thank you |
@Jojoooo1 did you setup single-node or multi-node cluster? |
Thanks! I actually had a single node! |
strictly speaking, k3s single-node can have etcd, but only if you added cluster-init parameter to k3s args / config / env |
where can I edit this file, for my k3s controller manager?
|
https://rancher.com/docs/k3s/latest/en/installation/install-options/#configuration-file
|
thanks man, anyway I don't have that file config, but i only have /etc/rancher/k3s/k3s.yaml and /etc/systemd/system/k3s.sservice and some files in /var/lib/rancher/k3s Is it possible to create it manually or do I need to upgrade my K3S? |
@rthamrin config file is not installed by default, you need to manually create it. |
enabled: kubeControllerManager, kubeScheduler, kubeEtcd see: onedr0p/home-ops#2378 k3s-io/k3s#3619
I had the same issue with a K3s cluster, and tried following the above solution, but after adding endpoints to the prometheus operator values file, the helm would fail to deploy with the following error: I found a working solution at this page: https://picluster.ricsanfre.com/docs/prometheus/#k3s-components-monitoring. Leaving it here in case if anyone runs it to the same issue with the helm chart. |
I spent a couple of days figuring out how to make default Firstly, Now to fix this properly is a bit difficult. All default grafana charts filter data by job name. I.e. the kube-proxy dashboard has kubelet:
serviceMonitor:
metricRelabelings:
# k3s exposes all metrics on all endpoints, relabel jobs that belong to other components
- sourceLabels: [__name__]
regex: "scheduler_(.+)"
targetLabel: "job"
replacement: "kube-scheduler"
- sourceLabels: [__name__]
regex: "kubeproxy_(.+)"
targetLabel: "job"
replacement: "kube-proxy" This simply sets job label to But there is another problem. Instance variable in grafana charts uses There are also other metrics, which are shared between components such as Also keep in mind that default It is really unfortunate that k3s makes it so complicated to use |
@chemicstry Thanks for the details, I suppose we could re-open this issue but I am not sure if it is something the k3s maintainers are willing to "fix". Ideally this should all work out of the box with the @brandond any comment on this? |
There isn't really anything we can fix on our side. The prometheus go libraries use a global metrics configuration, so any metrics registered by any component in a process are exposed by all metrics listeners. There's no way to bind specific metrics to a specific metrics endpoint, when they're all running in the same process. A core efficiency of K3s is that we run all the components in the same process, and we're not planning on changing that. |
Has anyone here found a good solution? What do you think about this? portefaix/portefaix-kubernetes#4682 portefaix/portefaix-kubernetes@dc767bd#diff-725c569b96f4a66ed07e1a4d1a5d8d24b3a500f1a1dae5b80444a2109ce94c17 |
If others come here by googling, I have found what seems to be a good solution.
|
@mrclrchtr if I understand this correctly you are still going to have duplicate metrics, this can lead to absolutely insane high memory usage with Prometheus. I recently switch my cluster from k3s to Talos and saw 2-3GB less usage in memory per Prometheus instance since Talos exports these metrics the "standard" way. The best method I found was to do the analysis across what needs to be kept on each component and write relabelings based upon that research. For example https://github.com/onedr0p/home-ops/blob/e6716b476ff1432ddbbb7d4efa0e10d0ac4e9a66/kubernetes/storage/apps/observability/kube-prometheus-stack/app/helmrelease.yaml However this isn't perfect as well and will not dedupe all metric labels across the components, it's prone to error and when updates to Kubernetes happen won't capture any new metrics being emitted. FWIW even with these relabelings I was still seeing a 3-4GB RAM usage per prometheus instance with them applied. I would love for k3s to support a native way to handle this with the kube-prometheus-stack as it's my major pain point with it, not obvious until one discovers this issue and one of the major reasons I am exploring other options like Talos. 😢 |
Damn... I was hoping this would be the solution... I'll probably have to look for alternatives too... I've invested way too much time in this already.... But thanks for the info! I'll have a look at Talos too. |
@brandond I've never worked with the Prometheus Go library but reading some documentation and issues has left me quite confused. Are you 100% certain that the library does not allow binding specific metrics to a specific metrics endpoint? Here are some issues/PRs regarding this issue:
Apparently, this has already been resolved in 2016. https://github.com/prometheus/client_golang/blob/main/examples/simple/main.go#L30-L45 @onedr0p Continuing our short discussion in prometheus-community/helm-charts#2865 (comment) I agree that this would be the real fix for k3s & kube-prometheus-stack compatibility. |
I don't think you're following me. Like I said, we don't modify the upstream Kubernetes code to the extent that this is possible. We just call the main entrypoints:
Internally each of these components call the same shared code from Normally this is fine because they run in separate processes with their own separate per-process metrics endpoints. Because the high-level cli entrypoints don't allow passing in metrics registerer/gatherers (and why should they, these are just CLI entrypoints) we cannot override them per component. |
Thank you for clarifying, that makes sense. Sorry for the naive questions, I guess I kind of knew that I am missing something, but didn't exactly know where to look. |
So what is the conclusion? It's not k3s, but prometheus's fault?
I have no choice but to move to Talos because of this. The workarounds above are terrible. What use is a cluster if I can't monitor it properly? |
What exactly is so onerous about this? The metrics for everything are all in one place BECAUSE EVERYTHING RUNS IN ONE PLACE. This is the core efficiency of K3s. If you scrape metrics from multiple endpoints you're going to get multiple copies of the metrics. That doesn't seem SUPER hard to deal with. Why is your monitoring configuration so inflexible that you can't either disable some of the unnecessary duplicate scraping, or add a relabel config to drop the duplicate metrics? It's not rocket science, there are untold thousands of organizations successfully using and monitoring K3s without issue. The Rancher monitoring charts handle it properly out of the box. |
Thank you for your reply. Let me start by saying that I appreciate the work that you put into k3s. With that said:
Doesn't this challenge the claim of "core efficiency"? If K3s is meant to be lightweight, but this architecture results in a much higher memory consumption, it would seem like there is room for improvement.
I prefer solutions that work out of the box. If I wanted to spend time tweaking configurations, I would have gone with kubeadm and read Kubernetes the Hard Way. A major benefit of K3s is its simplicity, and expecting users to implement workarounds for basic monitoring goes against that philosophy.
Judging by the number of comments in this thread, it seems like many people find it problematic. If something requires workarounds, it’s usually a sign that there’s an opportunity to improve the experience. |
Not speaking as k3s dev, but i think the whole point is that k3s is k8s distribution, not k8s fork. There is also multiple pretty much copypaste solutions with relabeling for standard tools like kube-prometheus-stack in this thread. Not everything is out-of-the-box solution. |
K3s doesn't use more memory. Period. Scraping ANYTHING multiple times with different labels will increase your Prometheus metric cardinality, but that is not something we manage in this project.
K3s works out of the box. It is your monitoring stack that has issues. I would recommend you direct your criticism at whatever chart or operator is managing the over-aggressive scraping. K3s works with monitoring built into Rancher since that is a project we maintain. We do not have the cycles to identify all of the many other tools that people might be using to monitor K3s with, figure out if they have issues, and fix them if so. As we've made clear multiple times, the problem is NOT how k3s exposes metrics. The k3s metrics accurately represent the components that are being scraped. Problems occur ONLY when you scrape it multiple times with distinct labels, which balloons the metric cardinality.
K3s is open source and readily accepts community contributions. Lots of people use it. Some small subset of them find specific things problematic enough to complain about it. Apparently no one (including you) has found it problematic enough to contribute a solution. |
Is your feature request related to a problem? Please describe.
Unable to monitor the following components using
kube-prometheus-stack
:Describe the solution you'd like
Add configuration options like in PR #2750 for each component so they are not only being bound to
127.0.0.1
E.g.
In kube-prometheus-stack configuration all you have to do is configure the following:
Describe alternatives you've considered
Deploying rancher-pushprox to get these metrics exposed but it's not very easy to do, or user-friendly
Additional context
I am willing to give a shot at opening a PR as it should be pretty close to #2750
Related to #425
The text was updated successfully, but these errors were encountered: