Threadpool query and performance tweaking #87

fullymiddleaged · 2025-01-20T23:37:51Z

Hey,

So, I've come from a Python implementation where I ran a worker per core (Gunicorn) and instructed Onnxruntime to use intra_op_num_threads = 1 and assign each to the correct core that the worker is running on. This seemed to help with peformance, as the inference sessions weren't fighting over cores. See snippet below:

cpu = psutil.Process().cpu_num()
sess_opt = rt.SessionOptions()
sess_opt.intra_op_num_threads = 1
cpu = str(cpu)
cpuoptions = "'session.intra_op_thread_affinities', '"+cpu+"'"
sess_opt.add_session_config_entry('session.intra_op_thread_affinities', cpuoptions)
onnxsession = rt.InferenceSession(config.MODEL_PATH, sess_options=sess_opt, providers=['CPUExecutionProvider'])

Is there a way to set up your server so we can ensure sessions are locked down per core? (Or any way to ensure the best response time)?

Cheers! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threadpool query and performance tweaking #87

Threadpool query and performance tweaking #87

fullymiddleaged commented Jan 20, 2025 •

edited

Loading

Threadpool query and performance tweaking #87

Threadpool query and performance tweaking #87

Comments

fullymiddleaged commented Jan 20, 2025 • edited Loading

fullymiddleaged commented Jan 20, 2025 •

edited

Loading