Skip to content

Latest commit

 

History

History
271 lines (212 loc) · 7.79 KB

README.md

File metadata and controls

271 lines (212 loc) · 7.79 KB

Security Monitoring Stack Kubernetes

The easiest way to install monitoring and monitor security of your kubernetes cluster.

This stack include:

  • Loki
  • Promtail
  • Grafana
  • Victoria Metrics Stack
  • Alertmanager
  • Kube-Bench Exporter
  • Falco Exporter
  • Trivy-Operator
Alerts From AlertManager
  • Some alerts:
    • Loki Alerts on Errors in Logs
    • Some default alerts
    • Kubernetes Node Not Ready
    • Kubernetes Memory Pressure
    • Kubernetes Disk Pressure
    • Kubernetes Network Unavailable
    • Kubernetes Out Of Capacity
    • Kubernetes Container Oom Killer
    • Kubernetes Job Failed
    • Kubernetes Cronjob Suspended
    • Kubernetes Persistentvolumeclaim Pending
    • Kubernetes Volume Out Of Disk Space
    • Kubernetes Volume Full In Four Days
    • Kubernetes Persistentvolume Error
    • Kubernetes Statefulset Down
    • Kubernetes Hpa Scaling Ability
    • Kubernetes Hpa Metric Availability
    • Kubernetes Hpa Scale Capability
    • Kubernetes Hpa Underutilized
    • Kubernetes Pod Not Healthy
    • Kubernetes Pod CrashLooping
    • Kubernetes Replicasset Mismatch
    • Kubernetes Deployment Replicas Mismatch
    • Kubernetes Statefulset Replicas Mismatch
    • Kubernetes Deployment Generation Mismatch
    • Kubernetes Statefulset Generation Mismatch
    • Kubernetes Statefulset Update Not RolledOut
    • Kubernetes Daemonset Rollout Stuck
    • Kubernetes Daemonset Misscheduled
    • Kubernetes Cronjob Too Long
    • Kubernetes Job Slow Completion
    • Kubernetes Api Server Errors
    • Kubernetes Api Client Errors
    • Kubernetes Client Certificate Expires Next Week
    • Kubernetes Client Certificate Expires Soon
    • Kubernetes Api Server Latency
    • Loki 5.. errors
    • Severity level - Error
    • Ledger Error

🌸 Setup

This is step-by-step how to install

Here are some details about install depends ..

Requirements:

2 CPU 4 GB RAM

Kubernetes Version which used in testing

v1.28.2

Pre-requirements

Clone repo

git clone https://github.com/chabanyknikita/security-monitoring-template.git
cd security-monitoring-template

Install helm repos for this stack

helm repo add jetstack https://charts.jetstack.io
helm repo add stable https://charts.helm.sh/stable
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo add aqua https://aquasecurity.github.io/helm-charts/
helm repo update

Firstly you can disable what you want in charts/monitoring/charts/SVC/values.yaml, example:

grafana:
  enabled: false


alertmanager:
  enabled: false

vmalert:
  enabled: false

Install nginx-Ingress and CertManger if you didn't install them

helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.8.0 \
--set installCRDs=true
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace

Configure AlertManger:

Go to charts/monitoring/values.yaml and change in section victoria-metrics-k8s-stack.alertmanager.config this values on your:

chat_id: <Chat Id> "chat_id must be integer"
bot_token: <Bot Token> "bot_token must be string"
  • You can get this values from:
KEYS VALUES
TELEGRAM_ADMIN Your chat id you can get from (@userinfobot)
TELEGRAM_TOKEN Your telegram bot token you can get from (@botfather)

Only 1 user can see bot alerts

Configure Grafana Ingress(Optional):

If you need access to grafana via domain:

Go to charts/monitoring/values.yaml in section victoria-metrics-k8s-stack.grafana.ingress and

  • Change grafana.ingress.enabled to true
  • Change grafana.ingress.hosts to your domain
  • Change grafana.ingress.tls.hosts to your domain

Example:

  ingress:
    enabled: true
    annotations:
      certmanager.k8s.io/cluster-issuer: letsencrypt
      cert-manager.io/cluster-issuer: letsencrypt
      kubernetes.io/ingress.class: nginx
      kubernetes.io/tls-acme: "true"
    pathType: ImplementationSpecific
    hosts:
      - grafana.example.com
    tls: 
     - secretName: grafana-ingress-tls
       hosts:
         - grafana.example.com

Change namespace For Loki Alerts

  • Got to charts/monitoring/values.yaml in section loki-distributed.ruler.directories and change in all rules namespace on your, which you want to follow

Example:

              - alert: Error 5**
                expr: rate({namespace="stage", container!="horizon"} |~ "status=5.." | logfmt | label_format duration=duration,time=time,filename=filename,pid=pid,stream=stream,node_name=node_name,app=app,instance=instance[1m])>0
                for: 0m
                labels:
                  severity: error
                annotations:
                  summary: Error {{ $labels.status }} in {{ $labels.container }}

# Or you can follow more than 1 namespace:

              - alert: Error 5**
                expr: rate({namespace=~"monitoring|stage|prod"} |~ "status=5.." | logfmt | label_format duration=duration,time=time,filename=filename,pid=pid,stream=stream,node_name=node_name,app=app,instance=instance[1m])>0
                for: 0m
                labels:
                  severity: error
                annotations:
                  summary: Error {{ $labels.status }} in {{ $labels.container }}

Upgrade your nginx-ingress for collecting metrics

helm upgrade ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx \
--set controller.metrics.enabled=true \
--set-string controller.podAnnotations."prometheus\.io/scrape"="true" \
--set-string controller.podAnnotations."prometheus\.io/port"="10254"

Installation

Install NFS:

helm upgrade -i nfs-server stable/nfs-server-provisioner --set persistence.enabled=true,persistence.size=20Gi -n monitoring --create-namespace

Install CRD:

kubectl apply -f charts/monitoring/charts/crd/templates/crd.yaml

Install trivy operator

helm upgrade -i trivy-operator aqua/trivy-operator --namespace trivy-system --create-namespace --version 0.20.6 --values charts/trivy-operator/trivy-values.yaml

Install Falco And Falco-Exporter

helm upgrade -i falco --set falco.grpc.enabled=true --set falco.grpc_output.enabled=true --set driver.kind=ebpf falcosecurity/falco
helm upgrade -i falco-exporter falcosecurity/falco-exporter

Optionally Install Event-Generator for visualize how event's rules working

helm install event-generator falcosecurity/event-generator --namespace event-generator --create-namespace --set config.loop=false --set config.actions=""

Install monitoring stack

helm upgrade -i monitoring charts/monitoring --values charts/monitoring/values.yaml -n monitoring

Get grafana password:

kubectl get secret --namespace monitoring stack-grafana \
-ojsonpath="{.data.admin-password}" | base64 --decode ; echo

Access to grafana ui

  • Credentials:

    • login: admin
    • password: from previous step
  • If you enable ingress go to your domain and paste credentials

  • If you don't enable ingress do port-forwarding and go http://localhost:3000:

kubectl port-forward service/stack-grafana -n monitoring 3000:80

Now you have installed Monitoring Stack on your Kubernetes cluster!