Improve shutdown process in Multiprocess mode #2010

rbtz-openai · 2023-06-17T00:04:33Z

Summary

This PR explicitly closes listening sockets in parent processe early in the shutdown process. That way, no new connections can hit workers after shutdown is initiated by the parent process.

While here, speedup shutdown by splitting process shutdown into terminate and join loop, so all child processes are shutdown in parallel.

Current behaviour

Here is a sample application:

import asyncio

from fastapi import FastAPI

app = FastAPI()


@app.get("/")
async def root():
    return {"Hello": "World"}


@app.on_event("shutdown")
async def shutdown_event():
    print("Shutdown event")
    await asyncio.sleep(10)
    print("Shutdown event done")


@app.get("/sleep")
async def sleep():
    """can be used to test graceful shutdown"""
    await asyncio.sleep(60)

It is ran with:

uvicorn app:app --workers 2

In a second window we can run simulated traffic generator, a curtesy of curl:

while :; do curl --connect-timeout 0.1 --max-time 1 localhost:8000; sleep 1; done

In third window we pkill(1) the parent:

pkill -f 'uvicorn app:app'

What is seen in uvicorn logs:

(fast) ➜  app uvicorn app:app --workers 2
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started parent process [87167]
INFO:     Started server process [87194]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [87195]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Normal traffic:

INFO:     127.0.0.1:63935 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63936 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63938 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63939 - "GET / HTTP/1.1" 200 OK

SIGTERM received:

INFO:     Shutting down
INFO:     Waiting for application shutdown.
Shutdown event

At this point only a single worker receives traffic:

INFO:     127.0.0.1:63940 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63942 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63943 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63944 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63945 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63946 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63947 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63948 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63949 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63950 - "GET / HTTP/1.1" 200 OK
Shutdown event done
INFO:     Application shutdown complete.
INFO:     Finished server process [87194]
INFO:     Shutting down
INFO:     Waiting for application shutdown.

Now we closed the last workers' sockets but parent still has opened socket, so we are dropping requests on the floor:

Shutdown event
Shutdown event done
INFO:     Application shutdown complete.
INFO:     Finished server process [87195]
INFO:     Stopping parent process [87167]

After that point we are throwing connection-refused.

From the client it looks like:

$ while :; do curl --connect-timeout 0.1 --max-time 1 localhost:8000; sleep 1; done
{"Hello":"World"}{"Hello":"World"}{"Hello":"World"}
# At this point we sent SIGTERM.  All the following responses come from a single alive worker:
{"Hello":"World"}{"Hello":"World"}{"Hello":"World"}{"Hello":"World"}{"Hello":"World"}{"Hello":"World"}{"Hello":"World"}{"Hello":"World"}{"Hello":"World"}{"Hello":"World"}{"Hello":"World"}
# Here this worker dies and we start dropping requests on the floor (non-retrieable).
curl: (28) Operation timed out after 1001 milliseconds with 0 bytes received
curl: (28) Operation timed out after 1003 milliseconds with 0 bytes received
curl: (28) Operation timed out after 1004 milliseconds with 0 bytes received
curl: (28) Operation timed out after 1002 milliseconds with 0 bytes received
curl: (28) Operation timed out after 1005 milliseconds with 0 bytes received
# Here master quits and we get clean (retriable connection refused)
curl: (7) Failed to connect to localhost port 8000 after 4 ms: Couldn't connect to server
curl: (7) Failed to connect to localhost port 8000 after 4 ms: Couldn't connect to server
curl: (7) Failed to connect to localhost port 8000 after 4 ms: Couldn't connect to server

New behaviour

New version:

Closes listening socket in parent process immediately.
Shuts down all worker processes in parallel, which has two side effects:
- No load imbalance between workers during shutdown.
- Faster shutdown porcess.

PS. The same situation can be reproduced w/o shutdown lifecycle event with --timeout-graceful-shutdown 30 and an endless stream of slow requests

Checklist

I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.

Kludex · 2023-06-17T08:43:45Z

uvicorn/supervisors/multiprocess.py

+        for sock in self.sockets:
+            sock.close()
+


Why do you need to close the sockets here again?

In UNIX-based OSes one needs to close all instances of a listening socket for it to unbind (incl. ones inherited by forks). In this case this includes:

master process.

all childen.

If you're hinting that in some OS'es (e.g. Windows, where my knowledge is quite limited) this would lead to an error I can wrap this in: try/except socket.error.

You can observe the behaviour of stalled http requests during shutdown with a server that has a health check http handler and a a shutdown lifespan handler:

@app.on_event("shutdown") async def shutdown(): await asyncio.sleep(60)

rbtz-openai · 2023-06-29T16:14:04Z

@Kludex A friendly ping on this ^

Explicitly clos listening sockets in both parent and child processes early in the shutdown process. That way, no new connections can hit workers after shutdown is initiated by the parent process. While here, speedup shutdown by splitting process shutdown into terminate and join loop, so all processes are shutdown in parallel.

rbtz-openai · 2023-07-13T22:38:03Z

uvicorn/server.py

-        for sock in sockets or []:
-            sock.close()


This is not needed.

asyncio.loop.create_server specifies:

Note The sock argument transfers ownership of the socket to the server created. To close the socket, call the server’s close() method.

asyncio.Server confirms that with:

close()(https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.Server.close)
Stop serving: close listening sockets and set the sockets attribute to None.

rbtz-openai · 2023-12-22T21:20:59Z

ref:

New multiprocess manager #2183

Kludex · 2024-04-13T12:41:41Z

Closing this, since we are refactoring the multiprocess module on New multiprocess manager #2183.

rbtz-openai force-pushed the master branch from 2782d16 to ea82f62 Compare June 17, 2023 00:07

Kludex reviewed Jun 17, 2023

View reviewed changes

Kludex added waiting author Waiting for author's reply and removed waiting author Waiting for author's reply labels Jun 17, 2023

rbtz-openai requested a review from Kludex July 13, 2023 19:20

rbtz-openai force-pushed the master branch from ea82f62 to 2bc9f78 Compare July 13, 2023 20:12

rbtz-openai marked this pull request as draft July 13, 2023 20:12

rbtz-openai commented Jul 13, 2023

View reviewed changes

rbtz-openai marked this pull request as ready for review July 13, 2023 22:38

Kludex added this to the Version 0.24.0 milestone Jul 18, 2023

Kludex closed this Apr 13, 2024

abersheeran mentioned this pull request Jun 25, 2024

feat: Improve process termination logic in multiprocess manager #2371

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve shutdown process in Multiprocess mode #2010

Improve shutdown process in Multiprocess mode #2010

rbtz-openai commented Jun 17, 2023 •

edited

Loading

Kludex Jun 17, 2023

rbtz-openai Jun 18, 2023 •

edited

Loading

rbtz-openai Jun 19, 2023 •

edited

Loading

rbtz-openai commented Jun 29, 2023

rbtz-openai Jul 13, 2023

rbtz-openai commented Dec 22, 2023

Kludex commented Apr 13, 2024

Improve shutdown process in Multiprocess mode #2010

Improve shutdown process in Multiprocess mode #2010

Conversation

rbtz-openai commented Jun 17, 2023 • edited Loading

Summary

Current behaviour

New behaviour

Checklist

Kludex Jun 17, 2023

Choose a reason for hiding this comment

rbtz-openai Jun 18, 2023 • edited Loading

Choose a reason for hiding this comment

rbtz-openai Jun 19, 2023 • edited Loading

Choose a reason for hiding this comment

rbtz-openai commented Jun 29, 2023

rbtz-openai Jul 13, 2023

Choose a reason for hiding this comment

rbtz-openai commented Dec 22, 2023

Kludex commented Apr 13, 2024

rbtz-openai commented Jun 17, 2023 •

edited

Loading

rbtz-openai Jun 18, 2023 •

edited

Loading

rbtz-openai Jun 19, 2023 •

edited

Loading