Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ws-worker: respond to a wake up call #869

Open
josephjclark opened this issue Feb 11, 2025 · 4 comments
Open

ws-worker: respond to a wake up call #869

josephjclark opened this issue Feb 11, 2025 · 4 comments
Assignees

Comments

@josephjclark
Copy link
Collaborator

When each worker starts up, it joins a websocket channel with lightning (see src/channels/worker-queue.ts)

It then polls that channel to "claim" any outstanding work items. Lighting will respond with 0 or more runs. If the worker receives no work, it'll back-off and poll a bit slower next time.

We want to add a feature where Lightning will send a "wake up" call to any workers with capacity. This allows us to short-cut the backoff and instantly claim any work, which lets is process runs faster.

So, the Worker needs to listen to some kind of event from lightning and, if it has capacity, instantly reset the backoff and trigger a fresh claim.

On the lightning side, this wake-up call needs to be sent similtaneously to everyone listening on the work queue. That could be a hundred workers. But it's Ok, because each worker will trigger a claim. The first one that happens to do so will get the work, and the others will be safely returned nothing.

We can engineer a response to this event on the worker side and test it with integration tests. Just set a super low backoff, trigger a wake up, and work should be instantly claimed. It's up to the Lightning team to work out how and when to send the wake up event.

Questions:

  • What is the wake up event called? on-queue-added? wake-up? trigger-claim? cancel-backoff?
@github-project-automation github-project-automation bot moved this to New Issues in v2 Feb 11, 2025
@josephjclark josephjclark moved this from New Issues to Ready in v2 Feb 11, 2025
@josephjclark
Copy link
Collaborator Author

I think this is a back door to a push (rather than pull) queue architecture. But that's fine. It's cheaper than re-engineering the whole thing, and means that Lightning can stay quite hands-off in terms of managing the worker pool.

@josephjclark
Copy link
Collaborator Author

@stuartc before we start on this can I ask you to take a quick look at the spec above, make sure it all adds up? What do you think about the event name?

@stuartc
Copy link
Member

stuartc commented Feb 13, 2025

Yeah everything adds up to me, how about work-available? Are our events dash or underscored?

As for a 'push' architecture, you're right - it's a nice step in that direction; but we're still avoiding having the cluster be aware of the state of all the workers.

@josephjclark
Copy link
Collaborator Author

josephjclark commented Feb 14, 2025

Tips for @doc-han:

  • start up the lightning mock
  • start up the worker
  • post a run to the lightning mock with curl (see lightning-mock readme)
  • in the mock, add a mode where on-enqueue, we send out this work-available event to the queue channel
  • in the worker, listen to the work-available event, cancel the claim backoff loop thing, and re-start it (it should claim instantly)

If you want to use lightning with your local worker, start lightning with RTM=false mix phx.server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready
Development

No branches or pull requests

3 participants