Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design htcondor pressure mechanism: #9

Open
dciangot opened this issue Dec 3, 2021 · 1 comment
Open

Design htcondor pressure mechanism: #9

dciangot opened this issue Dec 3, 2021 · 1 comment

Comments

@dciangot
Copy link
Contributor

dciangot commented Dec 3, 2021

Document the design proposal

Machine spawning plugin:

HTCondor

  • reserved resources at Site ONLY for Dask scheduler
  • dask scheduler submit "pilot" jobs to a dedicated queue on site CEs

K8s:

  • reserved resources at Site ONLY for Dask scheduler
  • dask scheduler create k8s pod for htcondor wn

changes in dask scheduler:

  • start
  • stop
  • scale
    -( job id pilot, job id dask wn )

FOR simple condor_q

the spawing is by managed by a script checking the queue (CINECA-like)

@dciangot
Copy link
Contributor Author

A possibility suggested by Stefano consists in following an industry like approach based on a virtual credit that can be recharged periodically or boosted by PC for high prio analysis for instance.

This would takle the problem of stopping submit request earlier when in struggle for resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant