-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best practice prometheus monitoring #425
Comments
In order to remove target scrape errors I use this configuration:
Unfortunatly core parts of k3s are not monitored using this config. |
It should be possible to monitor the API server, or at least give an option to change the advertise address. |
You can try my HelmChart CRD. It's not perfect, the kubelet for some reason does not report certain labels but it solves most of your issues. |
I am also trying to get kube-prometheus to work on k3s (currenlty version 0.8.0). I am running my cluster on arm, which complicates it a bit: kube-state-metrics and the kube-rbac-proxy for example are not readily available for arm. I made some images myself but lucky enough carlosedp has made the necessary arm images available. You can have a look at his github cluster_monitoring. Problem is though authorization for node-exporter and kube-state-metrics (and possibly more): it seems k3s uses another authentication version as user phillebaba has found. See issue carlosedp/cluster-monitoring#13 (comment) . Can k3s developers or anyone else maybe shed some light or advise on this? |
I've added a workaround in my cluster-monitoring stack to remove kube-rbac-proxy from node_exporter and kube-state-metrics. Can you test-out the k3s branch from https://github.com/carlosedp/cluster-monitoring/tree/k3s and report back if it worked? It's a matter of applying the manifests from "manifests" dir. They are already generated from jsonnet. |
With k3s (k3d) and kube-state-metrics (kube-rbac-proxy), I have the same problem. If the intention of k3s is to remove alpha and non-default features, I think the kube-rbac-proxy should change to use
|
The problem with changing to auth/v1 is that it would not be compatible with previous versions of k8s where the api was still beta. |
Fails first and eventually succeeds with the following additional changes after failure -
It just disables the creation of CRDs after first failed attempt. |
Hi, I now do have node-exporter metrics, thx, but cadvisor and the k3s kubelet still give authentication errors? Edit: I have changed prometheus-serviceMonitorKubelet.yaml to use https and include tls and now I can collect metrics with the carlosedp set of manifests (so without the kube-rbac-proxy). |
As I added to the readme on the repo with more details on carlosedp/cluster-monitoring#17, under K3s you need to use Docker as the runtime to have all cAdvisor metrics. |
Any update on this? It would be great to monitor with the Prometheus Operator Helm Chart. kubeControllerManager:
enabled: false
kubeScheduler:
enabled: false
kubeProxy:
enabled: false |
Yeah what is the latest on this? |
Is there an issue? I have a bog-standard prometheus install pointed at metrics-server and node-exporter. Literally copied the manifests over from an EKS cluster and didn't have to change anything. |
Hi @brandond, This issue just gave me the impression that Prometheus could be challenging to get up and running. So I was wondering, trying to inquire for an update onto some best practices. But, if its simply just throwing a Prometheus Helm chart at K3S I'll better just jump into it. |
You have to make sure you have things like metrics-server, kube-state-metrics, node-exporter etc deployed, but that's not unique to k3s. Nor is the prometheus scraper configuration. None of these should require any configuration that wouldn't be necessary on any other k8s cluster. |
Great stuff. Thank you Mr. @brandond |
Hi, I am new to k3s. I have got k3s installation set up. I am trying to pull metrics from the cluster. My prometheus is hosted outside. It would be great help if someone could throw some light on how to set this up. I have literally spent hours trying to find a solution. do the installation should have metrics server or kube-state-metrics running? |
kubeControllerManager:
endpoints:
- ip_of_your_master_node <i.e. 192.168.1.38>
kubeScheduler:
endpoints:
- ip_of_your_master_node <i.e. 192.168.1.38> This fixed my problems. |
+1 for KubeProxy |
I tried getting the https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack chart to run on my K3s cluster of 3 RPi4's, but sadly some of the images aren't proper multi-arch images (e.g. they fail with apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: kube-prometheus-stack
namespace: kube-system
spec:
chart: kube-prometheus-stack
repo: https://prometheus-community.github.io/helm-charts
targetNamespace: monitoring So, what would be the simplest (best practice) way to deploy a minimal installation of Prometheus and Grafana and perhaps point them to the Some of the guides on the internet immediately start utilizing all sorts of templated helper repositories, but that doesn't quite serve as an easy to understand minimal baseline installation at all. A tutorial installation IMHO shouldn't rely on any custom repo's, but rather use the conventional ones where possible. |
... After upgrading to k3s 1.19 (from 1.18), the promethgeus target scraping for those two targets stopped wokring, `Get "http://10.2.0.30:10252/metrics": dial tcp 10.2.0.30:10252: connect: connection refused` Looking at k3s-io/k3s#425 (comment) suggests the endpoint approach should work. Experimenting with removing the explicit endpoint callout to see if there is an improvement. Signed-off-by: Jeff Billimek <jeff@billimek.com>
It appears, for me at least, after upgrading from k3s 1.18 to 1.19 that the explicit endpoint approach stopped working. I suspect that there is now a firewall rule preventing connections to the endpoints on port 10251 & 10252 from anywhere other than 127.0.0.1 edit: This commit seems to be the culprit: 4808c4e#diff-c68274534954d72488196ca23f12cfb3ebe65998d9e7c4a43d7ba9acc9532574 |
This should help people a bit :) prometheus-community/helm-charts#626 Also I try to keep this repo up-to-date which is a bit of a quick start: https://github.com/cablespaghetti/k3s-monitoring |
Are you able to monitor kube controller, kube scheduler or kube proxy? I've looked at you repo and saw your PR over at kube-prometheus-stack but it seems like nothing works on k3s to have these monitored. @brandond I would love to hear how this worked for you, it doesn't seem like I'm doing anything wrong in my helm values. You can take a look at them over at: As soon as I enable those metrics (with or without an endpoint) they will not be scraped and the target will appear as down in prometheus. It would be great to get more eyes on this, as rolling out k3s in a production env would be wise to have these metrics collected. Let me know if you need any more information. |
The people maintaining kube-prometheus-stack unfortunately didn't like the PR due to the level of tweaking required to get k3s working. As such I'm not sure it's possible with the main chart right now. The way things work with k3s is that the api server endpoint gives you metrics from controller manager and scheduler as well. So you'll probably have all the metrics but the helm chart rules and dashboard don't expect them to be tagged with job=apiserver. I'm not sure how kube proxy works off the top of my head but it may well be the same. The way I see it is there are two options. Maintain a fork of the chart or have an option in k3s to split out the metrics endpoints in a "more standard" way which is compatible with the chart as it stands. |
In a separate issue, the rancher monitoring now uses a forked version of PushProx to get many of stats bound to localhost, from a single port. To see it in action, without loading up all of rancher, try this manifest file (drop in /var/lib/rancher/k3s/server/manifests). You'll get the operator servicemonitor, and 4 or 5 of the sets of stats from a single exporter..
To see what stats this enables, do a 'curl -s http://localhost:10249/metrics' |
@ThomasADavis I had to update your config: ---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: pushprox
namespace: monitoring
spec:
chart: https://charts.rancher.io/assets/rancher-pushprox/rancher-pushprox-0.1.201.tgz
targetNamespace: monitoring
valuesContent: |-
metricsPort: 10249
component: k3s-server
serviceMonitor:
enabled: true
clients:
port: 10013
useLocalhost: true
tolerations:
- effect: "NoExecute"
operator: "Exists"
- effect: "NoSchedule"
operator: "Exists" However, while that does get these components monitored they are not working out of the box with the default prometheus rules or grafana dashboards shipped with PrometheusGrafana |
Wild idea, kube-prometheus-stack + promethesu relabel of k3s to patch standard k8s deployment? |
I think if there was an option to bind the controller manager and the scheduler to kubeControllerManager:
endpoints:
- ip_of_your_master_node <i.e. 192.168.1.38>
kubeScheduler:
endpoints:
- ip_of_your_master_node <i.e. 192.168.1.38> Is there such an option or is the bind address hardcoded? |
@RouNNdeL see this commit 4808c4e
It is hardcoded. |
Are there any chances we would see this implemented as an option? We would have to accept the security risks of enabling it, but I'd be fine with that. |
I was able to get etcd monitored in
kubeEtcd:
enabled: true
endpoints:
- IP of k3s master 1
- IP of k3s master 2
- IP of k3s master 3
service:
enabled: true
port: 2381
targetPort: 2381 Default dashboard shipped with |
I have got all component monitored again: I believe this issue can be closed! |
Thanks! |
Describe the bug
I would like to monitor a k3s system. Therefore I installed the prometheus operator helm chart. Out of the box a lot of alerts are in state FIRING.
A lot of rules which cover the apiserver and kubelet are not working. Should users just disable these rules or ar you going to provide your own default rules for a k3s setup?
To Reproduce
Install prometheus helm chart with default values
Expected behavior
Everything should look green if k3s specific instructions were followed....
Screenshots
KubeAPIDown (1 active)
KubeControllerManagerDown (1 active)
KubeDaemonSetRolloutStuck (1 active) kube-state-metrics
KubeSchedulerDown (1 active)
KubeletDown (1 active)
TargetDown (2 active) apiserver, kubelet
The text was updated successfully, but these errors were encountered: