Please address the design failures and customers' concerns in Runner Scale Sets #3340
Replies: 12 comments 24 replies
-
I'll second 100% of this. |
Beta Was this translation helpful? Give feedback.
-
Hey! I have been speaking with @juliandunn about this and the engineering team, and wanted to give you an update along with play back where we are. We are still evaluating what to do next as there is no clear right answer that will satisfy the folks discussing here and the things we need to consider below. Thinking/historyThe first is to reiterate how we got here and why we launched scale sets with only a single label, and resulting in the new ARC mode with a single label. The headline reason for this was a decision we made based on customer feedback to increase the determinism of 'where' a job will run. Having only labels that map to a single object (i.e. a name) means that jobs cannot be 'stolen' by simply adding the wrong label to a runner, which was and is a major concern for many customers still. This is still a benefit to customers and one we need to keep in mind. I will also apologize for a lack of a clear change log when we released this. We did a really bare-bones changelog for the beta of ARC and scale sets but did not make this clear enough at all at the time. What it would take to changeThis is not something we can 'flip the bit' to go back on, the scale sets were built without multi-label support, so this would be an involved change for us to move over to. MitigationFor those who are blocked, the older scaling modes of ARC have not been removed and can continue to be used. Concerns raisedJust to ensure I capture everyone's concerns raised, I want to compile a list here that I will edit/add to as people respond in this thread. Security issues: The concern is that "I name a runner X and GitHub introduces that as a label", then the job can be taken by the hosted runner job. There is also a concern that adding 'self-hosted' can act as a 'blanket mitigation' against a job landing on a GitHub runner with a 'custom label'. Targeting jobs gets more complex: The concern is that this requires knowing the ordering of the components like ubuntu-2core while ubuntu, 2-core doesn't. Low determinism job targeting: Some customers want to be able to just match ubuntu and don't care if I get a 2-core or a 64 core machine. There is a desire for 'soft matching' that must span different runner groups. Runner groups are not in free teams: Though some people could use runner groups to mitigate some concerns (not all), this isn't feasible for all due to the limitation in creating more runner groups. Note creating more runner groups is now available in paid teams and I will take an Action to see what we can do here. High availability: We have docs on how to do this here, but if there are other forms of this concern that this does not address then I would <3 to know Migration: There is no easy path to move from multi to single label today. |
Beta Was this translation helpful? Give feedback.
-
👋🏽 Any chance you all can share some ETA on that "flexibility"? I've been holding a production migration because asking teams to make this change is quite noisy when many of the changes will "require" review due to the separation of responsibilities and processes a team might have. I am an Enterprise user, and it feels like the group of users you all asked for feedback on doesn't really represent the reality of enterprise-paying users at all. I'm kinda holding the release until this is sorted, but this has been going on for months with no sense of hope for |
Beta Was this translation helpful? Give feedback.
-
@tmehlinger , regarding your comment
could you point me to where in the docs this design decision is mentioned? I'd like to reference it during the evaluation of ARC and RSS. |
Beta Was this translation helpful? Give feedback.
-
Stumbling onto this thread a month later -- Any update on the funding plan and implications for this discussion? What brought me here was actually looking to switch from our current single-label (on legacy ARC) setup to multiple labels with what is being called "soft matching" and described at length above. We've got an eye on switching to RSS in the future, so we're now at a decision point between implementing multiple labels or implementing RSS. I, too, would like to see multiple labels supported (again). For many of my jobs I don't care where it runs as long as its Linux on x64... other jobs I want to additionally specify that it must run in prod, or on a "build" runner with particular tools available as opposed to a generic runner without. Echoing the tangentially related ask for some form of priorities/weighting would be great too -- the GPU example is a good one... a generic job might be okay with running there if its convenient, but I'd rather not tie up the resource and block a job that might require it. The previously mentioned static/on-demand or permanent/burst use-cases are also ones I would love to see -- ie, prioritize filling permanent nodes first, using bursted nodes only if necessary allowing them to spin back down as soon as the load spike drops off. Since ARC is effectively a k8s-exclusive thing, I'm not sure I agree that simplifying was necessary to solve confusion -- k8s has a very robust (and even complex) system of nodeselectors/affinities/taints/tolerances and an understanding of those is a prerequisite to running ARC in any form. While I think implementing the entire k8s scheduling featureset is overkill, its a great source of inspiration. Also, I'll say I very much appreciate the open discussion tone in this thread in contrast to the "No. Closed" approach in the original one. Thank you for taking the time to explain the rationale and constraints. |
Beta Was this translation helpful? Give feedback.
-
The dependabot documentation for running dependabot on self-hosted runners indicates that you must assign the With the intent from GitHub to limit runners to one-and-only-one label, how does this reconcile with dependabot and self-hosted runners? Are we meant to have a pool of self-hosted runners that are dedicated to only dependabot? |
Beta Was this translation helpful? Give feedback.
-
We also looked into switching to the new scale sets. It seemed to be an easy task until we hit this label issue. It would mean a lot of unnecessary work for us to remove the self-hosted label from all workflows plus it indeed indicates that those runners are self-hosted. Keeping this on hold and sticking to the old setup for now. |
Beta Was this translation helpful? Give feedback.
-
This is a blocker for us as well. The prospect of updating hundreds of repositories to switch to ARC runner scale sets is a major deterrent. The current multi-label system in runner deployments is essential for our scaling needs, and without it, the migration process is not only cumbersome but also introduces risks and inefficiencies. We'll be looking to go with the older scaling mode until runner scale sets label system improves. |
Beta Was this translation helpful? Give feedback.
-
The lack of labels, particularly the |
Beta Was this translation helpful? Give feedback.
-
I will throw my hat in on this as well, it's causing myself and my team significant issues. We had to update all our workflows, which was a huge pain, and since we use a canary release process for the cluster that hosts these runners, we're now faced with a ton of issues where the listeners on the new cluster coming in to replace the old one CANNOT register with github because the name is exactly the same and until the listeners are deleted from the old cluster, the new one can't register and start serving runners. This makes the canary deployment useless because we are no longer avoiding a situation where we just straight up don't have runners. If the listener names were unique and the runners were determined by labels, this would be a non-issue. It's incredibly frustrating and I don't get it. This was a bad decision and I hope they will reconsider it. I'm struggling to understand the reasoning given when outlining why this path was chosen to begin with, to satisfy customers that need a single label. Why not then get rid of the default self-hosted label and just allow us to specify what labels we want to use? Then people who want only one label can specify only one label. |
Beta Was this translation helpful? Give feedback.
-
Gah - only 2 weeks ago I finally got over the hurdle that the git cli was finally added to the runner container and now this 🤯 Not being able to even use the self-hosted label is painful |
Beta Was this translation helpful? Give feedback.
-
The label issue also leads to fun documentation articles like Using self-hosted runners in a workflow, which is largely about using multiple labels. Among other things, the article notes that, "You can create custom labels and assign them to your self-hosted runners at any time. Custom labels let you send jobs to particular types of self-hosted runners, based on how they're labeled" and by default, "A self-hosted runner automatically receives [multiple] labels". The fun part is that the article starts with the admonition that "Actions Runner Controller does not support multiple labels, only the name of the runner can be used in place of a label". 🤦 |
Beta Was this translation helpful? Give feedback.
-
I'm both a GHEC customer and spare-time user staring down an "upgrade" from Runner Deployments to Runner Scale Sets and couldn't possibly be more disappointed with this feature.
While trying to figure out how they're supposed to work, I found this discussion where customers operating at scale have expressed reservations about using scale sets because, to be completely frank, they don't work. Selecting on multiple labels is essential for scaling out an Actions deployment, and the newer ARC can't do it. Think of it this way--how utterly useless would Kubernetes be if you couldn't manage resources using label selection? Whether you like it or not, existing large Actions deployments work the same way.
I think the rationale for the design is pretty weak and it needs revisiting.
Having experienced the same frustration myself, my suggestion would be to just give me visibility into the job queue, showing the labels that didn't match any runners and thus prevented scheduling. I suggest looking at how Karpenter presents diagnostic output; it will very nicely explain how/why pods can't be scheduled in a Kubernetes cluster based on the combinations of labels and taints that apply to the capacity it manages. It's a problem remarkably similar to this one.
But....
What do "deterministic" and "where" mean? Do you mean specifically which runner would pick up my job? How could you possibly know ahead of time when you can't predict when a runner would become available? You don't need to tell me which runner a job will start on, I need to know which runners (plural) with matching capabilities can host a job, and then I don't care which one picks it up because they should all have identical features. None of this prevents you from just showing us the job queue.
This all works fine today with Runner Deployments. The only trouble my team has with Runner Deployments is the odd "it won't start" issue, which (again) would be mitigated by having detailed visibility into the job queue. If I could see which labels a job wants, and what permissions it has (also sorely lacking in diagnostic output), this problem is much easier to solve. You chose to make an entirely new and fundamentally flawed paradigm for ARC, then spent the time to implement it, instead of going for the simpler solution of showing information you already have.
New users? Of an advanced feature that requires hosting a CI platform in Kubernetes? How about we favor competency instead?
On using an excess of labels, documentation to the effect of "don't do that, do this instead" would have been much more useful for a lot less effort.
Literally thousands of workflows. How could this possibly be the best solution?
Then there's this, an actual security issue, flippantly disregarded by one of the maintainers. Being able to distinguish self-hosted runners from others is critical to avoiding issues like this one.
Finally, I really don't understand the decision to force multiple ARC deployments, one for each RSS. A single controller is capable of doing all the work, and the earlier controller design had a nice separation of responsibilities, fulfilling custom resources that could each describe exactly the capacity I need, labeled accordingly. Now you're just wasting capacity with a controller for every flavor of runner, and I have to manage N Helm installs. This is an inferior design by all measurements.
It's not my intention to be rude. I believe everyone involved has good intentions. Unfortunately, in this case, it isn't even remotely good enough, and the amount of work we, your customers, have to do to handle it is immensely frustrating.
Please reconsider the feedback and deal with it accordingly. To be perfectly transparent, my desire is that you will scrap RSS entirely and continue investing in the previous method.
As it stands now, I fully intend to stay on the older controller deployment and sincerely hope it gets continued support.
Beta Was this translation helpful? Give feedback.
All reactions