Gunicorn vs. Circus for managing process and socket #2404

withsmilo · 2022-04-08T15:29:02Z

withsmilo
Apr 8, 2022
Maintainer

Hi, BentoML team!
I'd like to always to thank you for your hard work for the v1.0 release. 👏
Recently I looked through the v1.0 sources, and found something interesting compared to the v0.13.1. You changed the core management engine for process and socket from Gunicorn to Circus, right? I can guess what was the reason for the change, but I want to hear your team's official answer and share it with others.

Answered by parano

Apr 8, 2022

Exactly as @bojiang and @timliubentoml answered - besides we want to provide proper async support, the main reason is that Gunicorn, as well as most tools in the Python web development stack are designed for running multiple homogeneous processes, where all processes are running identical web serving code, and it will just fork the same process to multiple workers for vertical scaling.

However this is not great for ML model serving workloads: A resource intense model may limit how many copies can fit in one machine, models will also be idle when other pre-processing, post-processing code is running, which leads to low resource utilization. In order to address this problem in BentoML 1.0, …

View full answer

timliubentoml · 2022-04-08T18:44:23Z

timliubentoml
Apr 8, 2022
Collaborator

Gunicorn mainly manages the processes for flask, which BentoML was previously based. Now we've separated the web portion from the model serving portion. We're using circus at a much lower level which will manage the runner/model serving processes. With the new way you can control the number of model instances independently from the webserver instances.

Separately, in order to support async apps, we transitioned from WSGI to ASGI. Instead of gunicorn and flask, we're now using starlette and asynio, which gives better performance overall.

@parano Did I get that right? Any other details you'd like to add or retract? 😂

0 replies

bojiang · 2022-04-08T19:25:48Z

bojiang
Apr 8, 2022
Maintainer

Thanks, @timliubentoml. They are major reasons.
It's trivial compares to user-facing API. Thus there are fewer records about it. Allow me to supplement some additional info here.

Another choice is the internal supervisor of Uvicorn (the ASGI server implementation in use).
We want to keep the standalone serving (bentoml serve --production) simple. Thus we shall achieve internal service discovery without introducing inter-processes communication. Everything should be decided before spawning.
So we need a low-level supervisor framework with these key features:

to manage processes of multiple components at the same time
to hold & distribute multiple ports for workers. This is important to prevent the race condition on free port binding between different processes

Circus meets the requirement well.

0 replies

parano · 2022-04-08T19:32:36Z

parano
Apr 8, 2022
Maintainer

Exactly as @bojiang and @timliubentoml answered - besides we want to provide proper async support, the main reason is that Gunicorn, as well as most tools in the Python web development stack are designed for running multiple homogeneous processes, where all processes are running identical web serving code, and it will just fork the same process to multiple workers for vertical scaling.

However this is not great for ML model serving workloads: A resource intense model may limit how many copies can fit in one machine, models will also be idle when other pre-processing, post-processing code is running, which leads to low resource utilization. In order to address this problem in BentoML 1.0, we introduced the Runner concept, which is a unit of computation(typically a Model), that will be scheduled in its own worker pool(or its own Pod in Yatai/Kubernetes deployment), that is different from the API server processes and can scale separately. Gunicorn doesn't really support this type of architecture, that's why we moved to Circus, which offers lower-level APIs to create and mange multiple processes.

0 replies

withsmilo · 2022-04-09T13:25:16Z

withsmilo
Apr 9, 2022
Maintainer Author

@timliubentoml @bojiang @parano
I also agree to move away from Gunicorn and introduce a new architecture based on Circus. Thanks for the detailed explanation, guys! 👏

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML

Gunicorn vs. Circus for managing process and socket #2404

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

BentoML

Gunicorn vs. Circus for managing process and socket #2404

withsmilo Apr 8, 2022 Maintainer

Replies: 4 comments

timliubentoml Apr 8, 2022 Collaborator

bojiang Apr 8, 2022 Maintainer

parano Apr 8, 2022 Maintainer

withsmilo Apr 9, 2022 Maintainer Author

withsmilo
Apr 8, 2022
Maintainer

timliubentoml
Apr 8, 2022
Collaborator

bojiang
Apr 8, 2022
Maintainer

parano
Apr 8, 2022
Maintainer

withsmilo
Apr 9, 2022
Maintainer Author