Advanced Kubernetes

Advanced Kubernetes

Architecture Overview

`apiserver`

Allows clients (e.g. kubectl) to talk with the cluster. Resource specifications can be submitted (with kubectl apply), running resources can be listed (with kubectl get pods), etc. via the apiserver. Only the apiserver can access the database (etcd).

`etcd`

This is the cluster's database. The desired state of the cluster is stored here.

Making Things Happen

Lets assume we want to create a Deployment on the K8s cluster:

Submit the Deployment config to the cluster via tha apiserver: kubectl apply -f deploy.yaml
The apiserver stores the declaration on the etcd database
The controller-manager (which constantly checks the etcd database -- via the apiserver -- for configs than needs additional declarations) picks up the Deployment declaration and submits appropriate Pod configs (e.g. if the Deployment has replicas: 10, it submits 10 Pod configs with randomly generated names)
The scheduler (which constantly checks the etcd database -- via the apiserver -- for declarations with no assigned nodes) picks up the Pod configs and updates the declaration to include a node on which to run. This updated config is then stored on the etcd database
The kubelet agent on each assigned node (which constantly check the etcd database -- via the apiserver -- for Pods scheduled to run on its host node) picks up the Pod config and runs the appropriate container(s) on its host
The kube-proxy agent on each assigned node (which constantly checks the etcd database -- via the apiserver -- for Service configs relating to its host node) picks up any Service-relevant information and applies the corresponding networking config on its host node

High Availability

Scaling with load

Kubernetes (K8s) clusters can be scaled:

Vertically: Increase capacity of nodes (i.e., underlying VMs)
Horizontally: Increase number of components

In increasing the number of nodes, some cluster components can operate with multiple simultaneous instances (e.g. apiserver, worker nodes, etc.) while others can not (e.g. controller-manager & scheduler).

In the case of controller-managers and schedulers, if multiple instances are available, a "leader election" is performed to decide which instance is active at any given time. In the case of etcd, the operation mode is kind of like an hybrid. Multiple instances can be read-from simultaneously, but only one instance (the leader) can be written-to. Whenever new data is written to the leader, it sends copies of the data to the other instances such that each instance is up to date at any given time.

Leader elections are done on a component-level, not on a node level. I.e., at any instance, the lead scheduler and the lead etcd instance need not be on the same node. In fact, the desired behaviour is that they are not on the same node, such that if the node fails, they don't all fail together.

The API Server: The Life of a Request

Authentication

"Who are you? Can you prove it?" This stage validates "who" a user claims to be.

K8s clusters have two categories of users: service accounts & normal users. Normal users are managed by cluster-independent services, e.g. a username-password file, or Google Accounts. Normal users cannot be created via the K8s API.

Service account users are managed by K8s. They are bound to specific namespaces. Some are created automatically, while additional ones can be created manually via API calls.

Authorization

"What can you do?" This stage validates what actions an authenticated user is allowed to perform.

Most K8s clusters use Role-Based Access Control (RBAC) for authorization.

Admission Controllers

They are plugins that intercept requests and can mutate (mutating admission controllers) or validate (validating admission controllers) resource requests. They are user-controlled, i.e. can be turned on/off via API calls. However, command-line flags at execution-time are required to change default on/off behaviour.

Some admission controllers:

Some default-on admission controllers:

The "MutatingAdmissionWebhook" and "ValidatingAdmissionWebhook" can be used to extend admission controllers to include custom user-defined functions via the MutatingWebhookConfiguration and the ValidatingWebhookConfiguration resources respectively.

Conversion

Allows the supply (or retrieval) of resource definitions in any API version supported by the cluster.

For example, if an rbac.authorization.k8s.io./v1beta1 resource definition is applied, the conversion stage automatically converts it into an rbac.authorization.k8s.io./v1 resource. If the resource is queried, e.g. with kubectl get role role_name -0 yaml, the more up-to-date definition (rbac.authorization.k8s.io./v1) is returned. However, the "deprecated" resource definition can be queried using kubectl get role.rbac.authorization.k8s.io./v1beta1 role_name -o yaml.

# list available api versions
kubectl api-versions

# describe api resources
kubectl api-resources

Reconciliation: The Engine of a Declarative System

# Show resource tree
kubectl tree <resource_type> <resource_name> # e.g. kubectl tree deploy my-deployment

The .metadata.ownerReferences object gives information about if (or not) a controller is managing a resource:

kubectl get ReplicaSet/<rep_set_name> -o jsonpath='{ .metadata.ownerReferences }' | jq

Sample result:

[
  {
    "apiVersion": "apps/v1",
    "blockOwnerDeletion": true,
    "controller": true,
    "kind": "Deployment",
    "name": "deploy-name",
    "uid": "c527577d-c0c0-42d2-80bf-84b1a5d52a31"
  }
]

`controller-manager`

Control loops (controllers) live in the controller-manager.
There is no API endpoint to list all available controllers. A "hack" to get most (if not all) controllers would be: kubectl -n kube-system get sa | grep controller.

`scheduler`

Choses a node for the pods to run on, fills out the nodeName field

`kubelet`

Runs on worker nodes
Gets the pods that should be running on the worker node via the API server and tells the container runtime (e.g. docker) to run the pod containers

`kube-proxy`

Runs on worker nodes
Updates the iptables rules on the node's kernel
Facilitates routing between cluster pods (via services)

Some components that are not part of core K8s but are needed to have a fully functioning cluster

Cluster DNS

Required to resolve service names into Cluster IPs
Runs as a pod within the cluster
Connects to the external world by routing out requests it cannot resolve

Ingress Controller

Required to deploy ingress pods
Runs as a pod in the cluster

External DNS

Runs as a pod in the cluster
Required to resolve assigned host names to LoadBalancer services that point to ingress resources
Looks up the service's public IP and creates appropriate public DNS records

References

Advanced Kubernetes: 1 Core Concepts
Kubernetes Docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Advanced Kubernetes

Architecture Overview

`apiserver`

`etcd`

Making Things Happen

High Availability

Scaling with load

The API Server: The Life of a Request

Authentication

Authorization

Admission Controllers

Conversion

Reconciliation: The Engine of a Declarative System

`controller-manager`

`scheduler`

`kubelet`

`kube-proxy`

Some components that are not part of core K8s but are needed to have a fully functioning cluster

Cluster DNS

Ingress Controller

External DNS

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Advanced Kubernetes

Architecture Overview

apiserver

etcd

Making Things Happen

High Availability

Scaling with load

The API Server: The Life of a Request

Authentication

Authorization

Admission Controllers

Conversion

Reconciliation: The Engine of a Declarative System

controller-manager

scheduler

kubelet

kube-proxy

Some components that are not part of core K8s but are needed to have a fully functioning cluster

Cluster DNS

Ingress Controller

External DNS

References

`apiserver`

`etcd`

`controller-manager`

`scheduler`

`kubelet`

`kube-proxy`