You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on the following slack discussion and the issue faced on the smaug cluster, we need to think of improvements to the current resource allocation policy.
The current approach just looks at the usage patterns of the Jupyterhub application. What happens when we have multiple services competing for the same cluster resources? How do we model the effect of cluster level quota restriction on the user level profile recommendation?
Aakanksha Duggal wrote:
Hello team,
I am getting an error while spawning a large kernel for the elyra image. Are we facing some issues today?
Anand Sanmukhani wrote:
it seems like we need to increase the quota for cpu limits
Humair Khan wrote:
seems like the limits went up instead of down
Humair Khan wrote:
lol
Humair Khan wrote:
for large
Humair Khan wrote:
let's increase it by 20 cores
Humair Khan wrote:
it's easy to increase it anyway
Anand Sanmukhani wrote:
cool cool
Anand Sanmukhani wrote:
updated
Anand Sanmukhani wrote:
the quota that we set for this ns was just a guess any way
Humair Khan wrote:
yeah, but it's good, this will help us tune it
Anand Sanmukhani wrote:
I think we can reduce the cpu requests quota
Humair Khan wrote:
yeah
Tom Coufal wrote:
hm.. what if we remove the cpu limit in the quota for that namespace and leave the request?
Humair Khan wrote:
I guess the question then becomes what we want to achieve with quotas, in my mind quotas are a hard bound for preventing exploding requests/limits
Tom Coufal wrote:
I take that back.. I'm getting confused with this again.. sigh.. 😕
Humair Khan wrote:
hahahahaha
Humair Khan wrote:
we should just let data build up on this and accumulate, so we can have a couple of months to look back on
Anand Sanmukhani wrote:
yeah
Humair Khan wrote:
if we never see limits pass a certain mark, we'll just reduce it accross the board
Anand Sanmukhani wrote:
december might not be a good month for it tho
Tom Coufal wrote:
wouldn't it be nice if there was a quota settings preventing total over utilization and not bound to some limits and request values? Similarly as you can say "you can't have more than 10 PVCs" that you would be able to say "you can't use more than X cores at the same time"
Humair Khan wrote:
yeah, this would be fantastic
Tom Coufal wrote:
I guess we need to wait for the next big thing after Kubernetes for that.. 😄
Humair Khan wrote:
but from what I understand, they don't due this due to the complexity of accurately retrieving usage metrics
Tom Coufal wrote:
yeah
Anand Sanmukhani wrote:
merging the PR
Anand Sanmukhani wrote:
Aakanksha Duggal can you try spawning your nb again?
Aakanksha Duggal wrote:
yes on it
Anand Sanmukhani wrote:
looks like it worked
Aakanksha Duggal wrote:
Yes! 👍
Aakanksha Duggal wrote:
Thank you 😄
Erik Erlandson wrote:
> wouldn't it be nice if there was a quota settings preventing total over utilization and not bound to some limits and request values? Similarly as you can say "you can't have more than 10 PVCs" that you would be able to say "you can't use more than X cores at the same time"
In general you can't implement this kind of resource policy without also implementing a preemption policy
Erik Erlandson wrote:
if your quota is 10, and user A is using 5, user B is using 4, what happens if user B tries to increase to 6? does he get that? Can he "steal" it from user A? How does one allocate or re-allocate?
Erik Erlandson wrote:
at the bottom, pods and their containers are cgroups - I can't remember if cgroups allow changing their cpu settings after the fact
@erikerlandson To your last point, if user A is using 5 and user B is using 4, and both of their pods have a limit of 10 then from what I understand user B's workload can at max get 1 more cpu since A is actively using 5 since before B wanted to go to 6. Am I missing something here?
@HumairAK@tumido@4n4nd could we continue this discussion here? I guess whatever we conclude here can be used to improve the current approach of recommending resource profiles.
Based on the following slack discussion and the issue faced on the smaug cluster, we need to think of improvements to the current resource allocation policy.
The current approach just looks at the usage patterns of the Jupyterhub application. What happens when we have multiple services competing for the same cluster resources? How do we model the effect of cluster level quota restriction on the user level profile recommendation?
Transcript of Slack thread: https://operatefirst.slack.com/archives/C01RMPVUUK1/p1638285784149000?thread_ts=1638285784.149000&cid=C01RMPVUUK1
The text was updated successfully, but these errors were encountered: