This is prepared based on a standard Kubeadm cluster.
- Enable imit range/resource quota
- Creation of these defaults can be managed by Kyverno
- Implement POD Security Standars
- Details can be found here
- Can be implemented using Kyverno or Opa Gatekeeper. You can read about OPA gatekeeper here
- Enable default network policy
- By default communication between all pods in a cluster is allowed
- In a multi tenant/team env, alteast enable NS isolation using default policies. Examples
- Can also have global default policies based on the needs. Eg: Deny all except external DNS
- Network policy can be implemened by CNIs like Calico and Cilium.
- Additional to the pods/workloads, CNIs like Calico and Cilium can manage host endpoints too.
- Creation of these defaults can be manged by Kyverno
- Enable Audit logging
- Enable secret encryption at rest
- Use RBAC to manage the authorization
- Use OpenID (OIDC) tokens as a user authentication strategy
- Minimise access to secrets using RBAC
- Restrict control plane scheduling (Default allowd on a Kubeadm Cluster) - Kyverno can do this
- Allow only known container registries - Kyverno or OPA can be used for this
- NodeLocal DNScache - https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/
- Set image pull policy to Always in a Multi tenant environment. Can be enabled as an adimission controller using API flag
- default-not-ready/unreachable-toleration-seconds
- Security Benchmark
- Image
- single process
- Tini or "shareProcessNamespace" option in K8s
- Signal Handling
- Reduce the image size as it would even affect the pull time. Distroless from Google is a nice option as the base image
- Output logs to stdout and stderr
- Using Probes
- Liveness probe can be avoided if the app can crash on error
- Readiness probe is executed throughout the pod's lifetime failing which the pod will be removed from endpoint list
- Make sure probes are not depend on external dependencies
- Startup Probe can be used with legacy applications that might require an additional startup time
- Nice links about Probes
- Resources (Cpu/Memory)
- Use memory requests and limit same
- CPU limit is implemented using CFS. Limit can cause throttling for the app
- Requests is used for scheduling and the resources will be blocked even if not used
- Some links
- QOS Policy
- Guaranteed
- Burstable
- BestEffort
- Prestop hook & Gracefultermination seconds
- Avoid using latest tags as a pod re-schedule might pull a new image
- Use Pod disruption budget to limits the number of Pods of a replicated application that are down simultaneously from voluntary disruptions
- Init containers
- Init containers can contain utilities or custom code for setup that are not present in an app image
- Another usecase can be to clone a Git repository
- StatefulSet
- Statefulset pod will not be migrated to a new node on the node failure like a Deployment
- https://medium.com/tailwinds-navigator/kubernetes-tip-how-statefulsets-behave-differently-than-deployments-when-node-fails-d29e36bca7d5
- https://learnk8s.io/production-best-practices
- https://cloud.google.com/architecture/best-practices-for-building-containers
- https://cloud.google.com/architecture/best-practices-for-operating-containers
- https://pracucci.com/kubernetes-dns-resolution-ndots-options-and-why-it-may-affect-application-performances.html