ARC Scale Set Issues on Kubernetes Mode #3299
thiago-juro
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have been trying to run ARC Runner Scale Set on EKS using Bottlerocket Linux and my experience has not been good so far.
I pasted below some of the issues I have encountered (some not related to ARC). For now I will keep using ARC Legacy with DinD, although it's not great it has been stable in our systems. Feel free post your issues and solutions in the comments below. Hopefully with the shared experience this process can be a bit easier.
containerMode: kubernetes
Problem: Node OutofCPU:
Cause: If ACTIONS_RUNNER_USE_KUBE_SCHEDULER is set to false, github container hook will create runner and worker pods in the same node. At some point, due to the container requests, the code tries to place runner and worker in the same node but kube scheduler rejects it, informing the node is out of resources.
Problem: Runner and worker container being created in different nodes and trying to attach the same EBS volume.
Cause: When setting ACTIONS_RUNNER_USE_KUBE_SCHEDULER to true, Kube Scheduler will spawn the runner and worker in different nodes. Runner and Worker need to share the same volume. Using this option requires a storage that supports ReadWriteMany like EFS.
Problem: Actions runner container hook fails:
Cause: To be investigated. But once volumes were removed from containers in the job definition, the jobs succeeded
Problem: Path denied issue on Bottlerocket Linux
Cause: Selinux config Downloading runner update fails: "An error occurred: Access to the path is denied" · Issue #981 · actions/runner
Problem: Jobs running on EFS take ages to complete
Cause: In our workflow we have jobs that manage multiple small files (e.g.: node_modules). EFS is not suitable for that type of workload. Open and close file operations increase a lot the time. Jobs that used to take 3 minutes took 58 minutes to complete. Going for provisioned throughput to match EBS 125 Mb/s would make it become expensive
EFS setup:
Performance Mode: General Purpose
Throughput Mode: Bursting (50 KB/s per GB)
Problem: tar decompressing failing due to permissions when using EFS. “Cannot change ownership to uid xxxx, gid xxxx: Operation not permitted".
Cause: Container security context fsGroup needs to match the runner 1001. On EFS I set uid and gid to be 0 and it solved the problem
Beta Was this translation helpful? Give feedback.
All reactions