Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threadpool query and performance tweaking #87

Open
fullymiddleaged opened this issue Jan 20, 2025 · 0 comments
Open

Threadpool query and performance tweaking #87

fullymiddleaged opened this issue Jan 20, 2025 · 0 comments

Comments

@fullymiddleaged
Copy link

fullymiddleaged commented Jan 20, 2025

Hey,

So, I've come from a Python implementation where I ran a worker per core (Gunicorn) and instructed Onnxruntime to use intra_op_num_threads = 1 and assign each to the correct core that the worker is running on. This seemed to help with peformance, as the inference sessions weren't fighting over cores. See snippet below:

cpu = psutil.Process().cpu_num()
sess_opt = rt.SessionOptions()
sess_opt.intra_op_num_threads = 1
cpu = str(cpu)
cpuoptions = "'session.intra_op_thread_affinities', '"+cpu+"'"
sess_opt.add_session_config_entry('session.intra_op_thread_affinities', cpuoptions)
onnxsession = rt.InferenceSession(config.MODEL_PATH, sess_options=sess_opt, providers=['CPUExecutionProvider'])

Is there a way to set up your server so we can ensure sessions are locked down per core? (Or any way to ensure the best response time)?

Cheers! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant