Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC implementation of using Celery for task running #7

Open
wants to merge 13 commits into
base: basic-interface
Choose a base branch
from

Conversation

lewisjared
Copy link
Contributor

@lewisjared lewisjared commented Nov 6, 2024

Description

A basic PoC of using celery workers to run metrics.

Celery is an async, distributed task queue. In this particular example we use redis as a backend as a task broker and results store. A parent process may queue a number of tasks which are to be executed out-of-process by the appropriate worker.

Each benchmarking package would build a docker image that contains all the required dependencies. This image can be used to spawn one or more docker containers. Each of these docker containers would run the ref-celery command which starts a new celery worker which listens for any tasks that need to be run.
When new calculations are queued up the appropriate run_metric function is called and the results returned to the parent process.

The stack can be started using docker-compose up.

This spawns 3 containers

  • redis - which is store the tasks that should be executed and the results from execution
  • flower - Optional, but provides a web interface to see the currently queued jobs (localhost:5555)
  • ref-metrics-example - Runs the example metrics provider as a celery worker in a standalone docker container

Outside of the docker containers jobs can be queued up (see scripts/runner.py).
This decouples the queueing of tasks from the execution of tasks.

Still todo

  • Logging workers joining/leaving and tracking new jobs

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

from ref_core.env import env
from ref_core.metrics import Configuration, TriggerInfo

config = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parent process will need to track the jobs that have been declared as worker join and leave the cluster. The workers may declare the list of jobs that they provide and their dependencies at run-time

Perhaps the parent process needs to know which providers to listen to some way of whitelisting/blacklisting metrics.

@lewisjared
Copy link
Contributor Author

lewisjared commented Nov 6, 2024

@nocollier Here is an PoC for using celery to manage distributed task execution. There are a few other refactoring changes thrown in so it might not be perfectly clear.

You should be able to run docker-compose up in one terminal to fetch and build the example worker and then in another terminal run uv run python scripts/runner.py. This will queue up a calculation which is run on the worker with the output written to a shared output directory.

You might also need to run make fetch-test-data first if you haven't already.

@lewisjared
Copy link
Contributor Author

I might revive this as we need an async approach to executing metrics and celery is the easiest way to be able to test async tasks locally without k8s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant