Add provisioning docs

rancher · Jun 14, 2024 · 95fc46e · 95fc46e
1 parent 36e9574
commit 95fc46e
Show file tree

Hide file tree

Showing 13 changed files with 616 additions and 0 deletions.
diff --git a/docs/kubernetes/README.md b/docs/kubernetes/README.md
@@ -8,3 +8,4 @@ This directory contains documentation on Kubernetes:
 This directory also contains several subdirectories on specific Kubernetes concepts:
 
 - [Distributions](./distributions/README.md)
+- [Provisioning](./provisioning/README.md)
diff --git a/docs/kubernetes/provisioning/README.md b/docs/kubernetes/provisioning/README.md
@@ -0,0 +1,11 @@
+# Provisioning
+
+This directory contains documentation on Rancher Provisioning V2, including implementation details for [engines](../terminology.md#engines) and [provisioners](../terminology.md#provisioners) supported by Rancher:
+
+- [Provisioning V2](./provisioning_v2.md)
+- [Generic Machine Provider](./generic_machine_provider.md)
+- [System Agent](./system_agent.md)
+- [Wins](./wins.md)
+- [Repositories](./repositories.md)
+
+This directory also contains a subdirectory dedicated to covering details about [CAPI](./capi/README.md) that are relevant to Provisioning V2.
diff --git a/docs/kubernetes/provisioning/capi/README.md b/docs/kubernetes/provisioning/capi/README.md
@@ -0,0 +1,8 @@
+# Cluster API (CAPI)
+
+This directory contains documentation on CAPI:
+
+- [Terminology](./terminology.md)
+- [Installation](./overview.md)
+- [Providers](./providers.md)
+- [Provisioning](./provisioning.md)
diff --git a/docs/kubernetes/provisioning/capi/installation.md b/docs/kubernetes/provisioning/capi/installation.md
@@ -0,0 +1,60 @@
+# CAPI Installation
+
+To use CAPI, a user must install the **CAPI controllers & CRDs** and one or more **CAPI "Provider" controllers & CRDs** onto a single cluster known as the **local / management** cluster.
+
+This is generally achieved by running `clusterctl init`.
+
+Once CAPI is installed, to create a cluster managed by CAPI (also known as a **downstream** cluster), a user will have to create a number of CAPI resources at the same time in the **local / management** cluster that reference CAPI Provider resources that are created alongside it, including:
+- A `Cluster`, which identifies a `<Distribution>Cluster` and `<Distribution>ControlPlane` CR that implements it
+- One or more `Machine`s, each of which identify a `<Infrastructure>Machine` and `<Distribution>Bootstrap` CR that implements it
+    - A `MachineDeployment` / `MachineSet` similarly references a `<Infrastructure>MachineTemplate` and `<Distribution>BootstrapTemplate` CRs to create a set of `Machine`s, `<Infrastructure>Machine`s, and `<Distribution>Bootstrap`s per replica requested in the spec
+
+> **Note**: `MachineDeployment` : `MachineSet` : `Machine` has the same relationship as `Deployment` : `ReplicaSet` : `Pod`
+
+- One or more `MachineHealthCheck`s, each of which identify periodic actions that need to be executed on `Machine`s to verify they are healthy
+
+> **Note**: On a failed `MachineHealthCheck`, a `Machine` that is part of a `MachineSet` gets deleted and replaced with a fresh `Machine`
+
+You can visualize the relationship between CAPI CRDs **alone** with the following graph:
+
+```mermaid
+graph TD
+    subgraph CAPI Cluster
+    direction BT
+    subgraph Machine Deployment A
+    subgraph Machine Set A
+    MachineA1("Machine A1")
+    MachineA2("Machine A2")
+    MachineA3("Machine A3")
+    end
+    end
+    subgraph Machine Deployment B
+    subgraph MachineSet B
+    MachineB1("Machine B1")
+    MachineB2("Machine B2")
+    MachineB3("Machine B3")
+    end
+
+    end
+    MachineHealthCheck("MachineHealthCheck(s)")
+    end
+
+    MachineHealthCheck-->MachineA1
+    MachineHealthCheck-->MachineA2
+    MachineHealthCheck-->MachineA3
+    MachineHealthCheck-->MachineB1
+    MachineHealthCheck-->MachineB2
+    MachineHealthCheck-->MachineB3
+```
+
+> **Note** Notice that while `MachineDeployment`, `<Distribution>BootstrapTemplate`, and `<Infrastructure>MachineTemplate` are **mutable**, the resources that they spawn (i.e. `MachineSet` / `Machine`, `<Distribution>Bootstrap`, and `<Infrastructure>Machine`) are considered to be **immutable**.
+>
+> This is an intentional pattern in CAPI since it allows CAPI to support **strategic upgrades** in production, rather than upgrading all-at-once.
+>
+> To clarify, when you modify one of the mutable resources listed above (`MachineDeployment`, `<Distribution>BootstrapTemplate`, and `<Infrastructure>MachineTemplate`), all immutable resources that the spawned **prior to the mutation** are left unaffected by the change; instead, a new set of those same immutable resources (i.e. a new `Machine`, `<Distribution>Bootstrap`, and `<Infrastructure>Machine`) are spawned based on the new configuration.
+>
+> Once the new `Machine` is Ready, CAPI will then proceed to get rid of the previous set of immutable resources (i.e. the old `Machine`, `<Distribution>Bootstrap`, and `<Infrastructure>Machine`) since they are no longer required.
+
+The manifest containing these resources is what is normally produced by running `clusterctl generate cluster` with the appropriate command-line arguments.
+
+Once these resources are created, it's expected that the CAPI "Provider" controllers will do the "real" work to provision the cluster.
diff --git a/docs/kubernetes/provisioning/capi/providers.md b/docs/kubernetes/provisioning/capi/providers.md
@@ -0,0 +1,23 @@
+# CAPI Providers
+
+## How do CAPI Providers work? 
+
+To provision clusters, CAPI performs a series of "hand-offs" to one or more CAPI Providers, i.e.
+
+1. User creates a `MachineDeployment`, `MachineSet`, `Machine`, or `Cluster` CR referencing one or more provider CRs that the user also creates, like `<Infrastructure>MachineTemplate`, `<Infrastructure>Machine`, `<Infrastructure>Cluster`, or `<Infrastructure>ControlPlane` by running a single command like `clusterctl generate cluster [name] --kubernetes-version [version] | kubectl apply -f -`.
+
+2. The provider detects the creation of its own CRs and does some action. **CAPI watches the provider CRs, but does no action till the provider is done.**
+
+3. Once the provider is done processing, the provider updates **certain, well-defined CAPI fields** on its own CRs and the CAPI controllers spring into action; on detecting that change in the provider CRs referenced by a CAPI CR, they **copy over the values of those CAPI fields** from the provider CR to the CAPI CR and persist the modified CAPI CR onto the cluster.
+
+4. On detecting the update to the CAPI resource for those well-defined CAPI fields, CAPI is able to continue the provisioning process until the next "hand-off".
+
+> **Note**: Without any providers, CAPI would not be able to do anything since no one is executing the other side of the "hand-off"; it relies on providers to respond back with information on those desired fields to continue execution. This is why you need to deploy CAPI with at least one provider, which usually defaults to the [KubeAdm](https://kubernetes.io/docs/reference/setup-tools/kubeadm/) CAPI provider.
+
+> **Note**: The reason why providers create their own custom CRDs is so that they have full control over adding additional fields under `.status`, `.spec`, or whatever other fields they would like to expose on their CRDs. 
+>
+> For example, if AWS would like to expose the ability to specify a network security group that will be tied to the provisioned machine (which may not translate to what an on-prem CAPI provider may want to allow users to be able to configure), AWS alone can offer that option in the `.spec` field of its `AWSMachine` CR but CAPI's corresponding `Machine` CR would not need to expose such a field.
+>
+> The only expectation that CAPI has in turn is that the CRDs themselves have to have certain specific well-defined `status` and `spec` fields, depending on the type of resource that CRD represents. These expectations are outlined in its [provider contract documentation](https://cluster-api.sigs.k8s.io/developer/providers/contracts.html), such as the fact that any CRD implementing `Cluster` needs to have `.spec.controlPlaneEndpoint` so that CAPI can copy that field over to the CAPI `Cluster` CR's `.spec.controlPlaneEndpoint`.
+>
+> As long as the CRD has those fields, it can be used in the `*Ref` fields (i.e. `infrastructureRef`, `controlPlaneRef`, `bootstrap.configRef`, etc.) of a CAPI CR.
diff --git a/docs/kubernetes/provisioning/capi/provisioning.md b/docs/kubernetes/provisioning/capi/provisioning.md
@@ -0,0 +1,63 @@
+# CAPI Provisioning
+
+On a high-level, here is how CAPI provisions clusters after users run a command like `clusterctl generate cluster [name] --kubernetes-version [version] | kubectl apply -f -`:
+
+1. Create cluster-level infrastructure pieces **(handled by [Cluster Infrastructure Provider](./terminology.md#cluster-infrastructure-provider))**
+
+2. **ONLY IF** using `MachineDeployment` / `MachineSet`: Create `Machine`, `<Infrastructure>Machine`, and `<Distribution>Bootstrap` objects resources for each replica requested in the `MachineSet` spec **(handled by CAPI Controllers)**
+
+3. Create a Machine Bootstrap Secret per `<Distribution>Bootstrap` that contains the script that needs to be installed right after provisioning a machine to add it to the Kubernetes cluster **(handled by [Bootstrap Provider](./terminology.md#bootstrap-provider))**
+
+4. Provision a physical server per `<Infrastructure>Machine` by contacting the infrastructure provider (i.e. AWS, Azure, etc.) and running the bootstrap script in the Machine Bootstrap Secret on the machine before marking it as Ready **(handled by [Machine Provider](./terminology.md#machine-infrastructure-provider))**
+
+5. Copy the `<Infrastructure>Machine` fields over to the corresponding CAPI `Machine` **(handled by CAPI Controllers)**
+
+6. Initialize the cluster's controlplane (only once all `Machine`s are marked as Ready) using the configuration on the `<Distribution>ControlPlane` and join the bootstrapped nodes onto the controlplane; once all `Machine`s are joined, create a `KUBECONFIG` that can be used to access the newly provisioned cluster's Kubernetes API **(handled by [ControlPlane Provider](./terminology.md#control-plane-provider))**
+
+7. Copy the `<Distribution>ControlPlane` fields over to the corresponding CAPI `Cluster`, specifically including the control plane endpoint that can be used to communicate with the cluster **(handled by CAPI Controllers)**
+
+Once these steps have been taken, a user can run `clusterctl get kubeconfig` to access the newly provisioned downstream cluster's Kubernetes API.
+
+```mermaid
+graph TD
+    CAPIControllers("CAPI Controllers\n (copies fields back to Cluster and Machine CRs)")
+
+    subgraph Providers
+    ClusterProvider("Cluster Provider")
+    BootstrapProvider("Bootstrap Provider")
+    MachineProvider("Machine Provider")
+    ControlPlaneProvider("Control Plane Provider")
+    end
+
+    subgraph Provider CRs
+    InfrastructureCluster("&ltInfrastructure&gtCluster")
+    DistributionBootstrap("&ltDistribution&gtBootstrap")
+    DistributionBootstrapTemplate("&ltDistribution&gtBootstrapTemplate")
+    InfrastructureMachine("&ltInfrastructure&gtMachine")
+    InfrastructureMachineTemplate("&ltInfrastructure&gtMachineTemplate")
+    DistributionControlPlane("&ltDistribution&gtControlPlane")
+    end
+    
+    subgraph Physical Resources
+    ClusterInfrastructure("Cluster-Level Infrastructure\n(LoadBalancers, NetworkSecurityGroups, etc.)")
+    PhysicalServer("Physical Server")
+    MachineBootstrapSecret("Machine Bootstrap Secret\n(Bash script)")
+    KubeConfig("KUBECONFIG")
+    end
+
+    CAPIControllers--On Cluster Create-->ClusterProvider
+    CAPIControllers--Before Machines Create-->BootstrapProvider
+    CAPIControllers--On Machines Create-->MachineProvider
+    CAPIControllers--On Machines Ready-->ControlPlaneProvider
+    
+    ClusterProvider-.->InfrastructureCluster
+    InfrastructureCluster-.-> ClusterInfrastructure
+    BootstrapProvider-.->DistributionBootstrap
+    BootstrapProvider-.->DistributionBootstrapTemplate
+    DistributionBootstrap-.->MachineBootstrapSecret
+    MachineProvider-.->InfrastructureMachine
+    InfrastructureMachine-.->PhysicalServer
+    MachineProvider-.->InfrastructureMachineTemplate
+    ControlPlaneProvider-.->DistributionControlPlane
+    DistributionControlPlane-.->KubeConfig
+```
diff --git a/docs/kubernetes/provisioning/capi/terminology.md b/docs/kubernetes/provisioning/capi/terminology.md
@@ -0,0 +1,83 @@
+# Terminology
+
+## Cluster API (CAPI)
+
+[Cluster API (CAPI)](https://cluster-api.sigs.k8s.io/introduction.html) is a declarative API for provisioning and managing Kubernetes clusters.
+
+Once CAPI is installed, users are expected to use [`clusterctl`](https://cluster-api.sigs.k8s.io/clusterctl/overview.html), a command line tool that supports commands like:
+- `clusterctl init` to install the CAPI and CAPI Provider components that listen to CAPI and CAPI Provider CRDs
+- `clusterctl generate cluster` to create the Kubernetes manifest that defines a CAPI Cluster, which consists of CAPI and CAPI Provider CRDs
+- `clusterctl get kubeconfig` to get the `KUBECONFIG` of a CAPI-provisioned cluster to be able to communicate with it
+
+## CAPI Provider
+
+CAPI Providers are sets of controllers that are implemented by third-parties (i.e. AWS, Azure, Rancher, etc.) that provision infrastructure on CAPI's behalf.
+
+These controllers register their own Custom Resource Definitions (CRDs), which allow users to create provider-specific Custom Resources (CRs) to manage their infrastructure.
+
+For more information on how providers work, please read the [docs](./providers.md).
+
+## (Cluster) Infrastructure Provider
+
+A [Cluster Infrastructure Provider](https://cluster-api.sigs.k8s.io/developer/providers/cluster-infrastructure.html) is the **first** provider that gets called by the series of hand-offs from CAPI.
+
+This provider is expected to implement the following CRD:
+- `<Infrastructure>Cluster`: referenced by the `.spec.infrastructureRef` of a CAPI `Cluster` CR
+
+On seeing a `<Infrastructure>Cluster` (i.e. `AWSCluster`) for the first time, a Cluster Infrastructure Provider is supposed to create and manage any of the **cluster-level** infrastructure components, such as a cluser's Subnet(s), Network Security Group(s), etc. that would need to be created before provisioning any machines.
+
+> **Note**: As a point of clarification, Rancher's Cluster Infrastructure Provider's CRD is called an `RKECluster` since it is used as the generic Cluster Infrastructure CR for multiple infrastructure providers, although in theory Rancher should be having `DigitalOceanCluster`s or `LinodeCluster`s instead.
+>
+> This is because Rancher today does not support creating or managing **cluster-level infrastructure components** (which would normally be infrastructure-provider-specific) on behalf of downstream clusters.
+
+Then, once the downstream cluster's API is accessible, the Cluster Infrastructure Provider is supposed to fill in the `<Infrastructure>Cluster` with the controlplane endpoint that can be used by `clusterctl` to access the cluster's Kubernetes API; this is then copied over to the CAPI `Cluster` CR along with some other status fields.
+
+## Bootstrap Provider
+
+A [Bootstrap Provider](https://cluster-api.sigs.k8s.io/developer/providers/bootstrap.html) is the **second** provider that gets called by the series of hand-offs from CAPI.
+
+This provider is expected to implement the following CRDs:
+- `<Distribution>Bootstrap`: referenced by the `.spec.bootstrap.ConfigRef` of a CAPI `Machine` CR
+- `<Distribution>BootstrapTemplate`: referenced by the `.spec.bootstrap.ConfigRef` of a CAPI `MachineDeployment` or `MachineSet` CR
+
+On seeing a `<Distribution>Bootstrap` (i.e. `RKEBootstrap`), the Bootstrap Provider is expected to create a **Machine Bootstrap Secret** that is referenced by the `<Distribution>Bootstrap` under `.status.dataSecretName`.
+
+This **Machine Bootstrap Secret** is expected to contain a script (i.e. "bootstrap data") that should be run on each provisioned machine before marking it as ready; on successfully running the script, the machine is expected to have the relevant Kubernetes components onto the node for a given **Kubernetes distribution (i.e. kubeAdm, RKE, k3s/RKE2)**.
+
+> **Note**: A point of clarification is that the Bootstrap Provider is not involved in actually running the script to bootstrap a machine.
+>
+> Running the script defined by the bootstrap provider falls under the purview of the [Machine Infrastructure Provider](#machine-infrastructure-provider).
+
+## Machine (Infrastructure) Provider
+
+A Machine Provider (also known as a [Machine Infrastructure Provider](https://cluster-api.sigs.k8s.io/developer/providers/machine-infrastructure.html)) is the **third** provider that gets called by the series of hand-offs from CAPI.
+
+Machine Providers are expected to implement the following CRDs:
+- `<Infrastructure>Machine`: referenced by the `.spec.infrastructureRef` of a CAPI `Machine` CR
+- `<Infrastructure>MachineTemplate`: referenced by the `.spec.infrastructureRef` of a CAPI `MachineSet` or `MachineDeployment` CR
+
+On seeing the creation of an `InfrastructureMachine`, a Machine Provider is responsible for **provisioning the physical server** from a provider of infrastructure (such as AWS, Azure, DigitalOcean, etc. as listed [here](https://cluster-api.sigs.k8s.io/user/quick-start.html#initialization-for-common-providers)) and **running a bootstrap script on the provisioned machine** (provided by the [Bootstrap Provider](#bootstrap-provider) via the **Machine Bootstrap Secret**).
+
+The bootstrap script is typically run on the provisioned machine by providing the bootstrap data from the **Machine Bootstrap Secret** as `cloud-init` configuration; if `cloud-init` is not available, it's expected to be directly run on the machine via `ssh` after provisioning it.
+
+> **Note**: What is [`cloud-init`](https://cloud-init.io/)?
+>
+> Also known as "user data", it's generally used as a standard for providing a script that should be run on provisioned infrastructure, usually supported by most major cloud providers.
+
+## Control Plane Provider
+
+A [Control Plane Provider](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/control-plane.html) is the **fourth** provider that gets called by the series of hand-offs from CAPI.
+
+Control Plane Providers are expected to implement the following CRD:
+
+- `<Distribution>ControlPlane`: referenced by the `.spec.controlPlaneRef` of a CAPI `Cluster` CR. This contains the configuration of the cluster's controlplane, but is only used by CAPI to copy over status values
+
+
+On seeing a `<Distribution>ControlPlane` and a set of `Machine`s that are Ready, a control plane provider has a couple of jobs:
+- Initializing the control plane by managing the set of `Machine`s designated as control plane nodes and installing the controlplane components (`etcd`, `kube-api-server`, `kube-controller-manager`, `kube-scheduler`) and other optional services (`cloud-controller-manager`, `coredns` / `kubedns`, `kube-proxy`, etc.) onto it
+- Generating cluster certificates if they don't exist
+- Keeping track of the state of the controlplane across all nodes that comprise it
+- Joining new nodes onto the existing cluster's controlplane
+- Creating / managing a `KUBECONFIG` that can be used to access the cluster's Kubernetes API
+
+Once the bootstrap provider has finished what it needs to do, the downstream cluster is expected to be fully provisioned; you can then run a `clusterctl` command to get the `KUBECONFIG` of your newly provisioned cluster.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -8,3 +8,4 @@ This directory contains documentation on Kubernetes:
		This directory also contains several subdirectories on specific Kubernetes concepts:

		- [Distributions](./distributions/README.md)
		- [Provisioning](./provisioning/README.md)