You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So, I've come from a Python implementation where I ran a worker per core (Gunicorn) and instructed Onnxruntime to use intra_op_num_threads = 1 and assign each to the correct core that the worker is running on. This seemed to help with peformance, as the inference sessions weren't fighting over cores. See snippet below:
Hey,
So, I've come from a Python implementation where I ran a worker per core (Gunicorn) and instructed Onnxruntime to use intra_op_num_threads = 1 and assign each to the correct core that the worker is running on. This seemed to help with peformance, as the inference sessions weren't fighting over cores. See snippet below:
Is there a way to set up your server so we can ensure sessions are locked down per core? (Or any way to ensure the best response time)?
Cheers! :)
The text was updated successfully, but these errors were encountered: