The Voltron Data DevOps team has developed a solution to provide an Actions Runner Controller to support GitHub Actions workflows with a Kubernetes cluster that can auto-scale according to the needs of the workflows. It includes both Horizontal pod and node auto-scaling across both Linux and Windows nodes.
You need to install Pulumi, the Go language runtime, the Flux CLI, the AWS CLI and kubectl in your local environment
- Install Pulumi: https://www.pulumi.com/docs/get-started/aws/begin/#install-pulumi
- Install Go: https://go.dev/doc/install
- Install Flux: https://fluxcd.io/flux/installation/#install-the-flux-cli
- Install AWS CLI: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
- Install kubectl: https://kubernetes.io/docs/tasks/tools/#kubectl
You need to configure both your local and cloud environments to interact with AWS and GitHub
- Deploy the S3 bucket we will use for the backend state
- There is a CloudFormation template in the
pulumi/backend
folder. Deploy it in the region you want to host the project
- There is a CloudFormation template in the
- Create an SSH Key in AWS EC2 and store its name: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/create-key-pairs.html
- Open a terminal session
- Confirm that you can see the S3 bucket created.
-
aws s3 ls
Set the profile (
AWS_PROFILE=XXX
) or the region (AWS_REGION=XXX
), if you are not using the defaults in your profile
-
- Confirm that you can see the S3 bucket created.
- Login to your bucket using Pulumi
pulumi login s3://xxx.xxx.xxx/xxx/xxx
- https://www.pulumi.com/docs/intro/concepts/state/#aws-s3
- Make sure your GitHub account has an SSH Key configured:
The Actions runners need a Docker image to launch the pods that will run them. The Dockerfiles for these images are defined in docker/
. There are two GitHub Actions workflows as part of this repository to build and publish said images. In order to operationalize them you need to update the files in .github/workflows
with this repository's owner as the first parameter and the repository's name as the second parameter where you see the placeholder . After doing this, trigger a manual run of each workflow. After ~ 25 minutes both jobs will finish building and pushing. Go to the main page of your repo in GitHub.com and click on the Packages section. Go to Package Settings
and in the Danger Zone
change the visibility of the images to public.
Now that we have our local and cloud environments set up, we can continue with doing the main deployment of the stack
-
Clone the
voltrondata-labs/gha-controller-infra
repogit clone git@github.com:voltrondata-labs/gha-controller-infra.git
-
Update the URL of the Pulumi backend in
Pulumi.yaml
keybackend.url
-
Deploy a new stack of the Pulumi deployment:
This will create a new stack
cd pulumi/deployment
pulumi stack init production
- This step will create a new file called
Pulumi.production.yaml
which will only have theencryptionsalt
; copy all of the other values fromPulumi.staging.yaml
into this file and replace the details as needed. Remember that you need to replace the tags and the SSH Key Name at the least. - It will ask you to create a passphrase; store it well as you will need it to make stack updates
- This step will create a new file called
pulumi up
-
Create a GitHub App Secret for the Controller
We need to create a GitHub App in the organization in which the repository that will trigger the Actions lives and store it’s credentials as a secret in the Kubernetes cluster; this is not the same repository as the repository holding the infrastructure code
-
Create the organization’s GitHub app with the necessary scopes:
-
Install the GitHub App in the organization and give it access to the repository
-
Follow the document linked above to get the App ID (
APP_ID
), Installation ID (INSTALLATION_ID
), and the downloaded private key file. -
Format the private key file:
openssl pkcs8 -topk8 -inform PEM -outform PEM -in downloaded-key.pem -out new-key.pem -nocrypt
-
Set your local
kubectl config
to the cluster you created:aws eks update-kubeconfig --region region-code --name my-cluster
-
Create the
actions-runner-system
namespace:kubectl create ns actions-runner-system
-
Export the variables:
export APP_ID="" export INSTALLATION_ID="" export PRIVATE_KEY_FILE_PATH=""
-
Create a secret in the cluster to store the credentials (note the name):
kubectl create secret generic controller-manager \ -n actions-runner-system \ --from-literal=github_app_id=${APP_ID} \ --from-literal=github_app_installation_id=${INSTALLATION_ID} \ --from-file=github_app_private_key=${PRIVATE_KEY_FILE_PATH}
-
-
Bootstrapping Flux to the Kubernetes cluster
Now we need to set up Flux with the Kubernetes cluster to have Continuous Deployment up and running in the cluster. This way we can manage the runners from the GitHub YAML files instead of from the
kubectl
CLI tool.- Generate a classic personal access token (PAT) that can create and manage existing repositories by checking all permissions under
repo
andadmin
- This PAT will be to write to the repo hosting the infrastructure. We recommend having a repository for the infrastructure and another repository that will be the one hosting and submitting the GitHub Actions.
- Set your PAT:
export GITHUB_TOKEN=<your-token>
- Bootstrap Flux in Production:
flux bootstrap github --owner=<org> --repository=<repo> --path fluxcd/clusters/production
- Generate a classic personal access token (PAT) that can create and manage existing repositories by checking all permissions under
- Helm Deployments
- Copy all of the files in
fluxcd/clusters/staging
intofluxcd/clusters/production
There will already be aflux-system
folder influxcd/clusters/production
from the bootstap step. Do not delete it as this is the link to the Kubernetes Cluster.. - Update the
kustomizations.yaml
file to point to the production cluster in the paths of the Kustomizations - You need to replace/fill in values for two deployments:
aws-system/aws-auth.yaml
- If you are not deploying Windows nodes delete the
aws-system/aws-auth.yaml
file. Otherwise replace the values for the role ARNs with the ones from the pulumi output. If you need to get the output again you can runpulumi stack output
and it will print the values. If you replicate the default configuration the first value is for the Linux ARN and the second value is for the Windows ARN.
- If you are not deploying Windows nodes delete the
aws-system/aws-cluster-autoscaler-autodiscover.yaml
- The value of
annotations.[eks.amazonaws.com/role-arn
in line 9 should also be replaced by the role ARN of the Cluster Autoscaler in your account. This also shows up in thepulumi stack output
with the keyautoScalerRoleArn
. - The value of
k8s.io/cluster-autoscaler
in line 168 needs to be replaced with the cluster name
- The value of
actions-runners/runner-deployments/
- In both of the files in this folder you need to specify the image with the tag (see the Docker section above) and the repository which will be receiving the runners (
owner/repo-name
format).
- In both of the files in this folder you need to specify the image with the tag (see the Docker section above) and the repository which will be receiving the runners (
- Copy all of the files in
- There is a problem with standing up Flux from the first go given the need of creating Custom Resource Definition's and the dependency order, to work around this limitation follow the next steps:
- Remove the
actions-runners/
folder & the first twoKustomization
entries in thekustomizations.yaml
file (lines 1-30). git add .
; thengit commit -m "Fixing FluxCD loadup error"
; thengit push
- After 2 minutes of waiting, force a flux sync:
flux reconcile kustomization flux-system
- After 5 minutes of waiting, check on the
kustomizations
:flux get kustomization -A
- After all
kustomizations
are marked as ready; run agit revert HASH_OF_THE_LAST_COMMIT
; thengit push
- After 2 minutes of waiting, force a flux sync:
flux reconcile kustomization flux-system
- After 5 minutes of waiting, check on the
kustomizations
:flux get kustomization -A
- Remove the
With these steps, you should be successful in deploying an Actions Runner Controller with an Autoscaler enabled ready to receive jobs from the GitHub Actions API and with FluxCD enabled.
Copyright [2022] [Voltron Data]
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.