Is InferenceSession.Run thread-safe when using DirectML provider? #9441
-
i'm using OnnxRuntime for inference with yolov3 model and try to using async method to improve the performance because i see my graphic card only 40% loaded. but when i call InferenceSession.Run in Task.Run below error thrown:"Attempted to read or write protected memory. This is often an indication that other memory is corrupt." , so i guess InferenceSession.Run is not thread-safe, so if this is true will it be thread-safe in future and right now is there's any other way to improve performance to fully utilize the performance of grahpic car? can any one please help? thanks very much. Microsoft.ML.OnnxRuntime 1.9 |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
i found this document https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html Additionally, as the DirectML execution provider does not support parallel execution, it does not support multi-threaded calls to Run on the same inference session. That is, if an inference session using the DirectML execution provider, only one thread may call Run at a time. Multiple threads are permitted to call Run simultaneously if they operate on different inference session objects. Performance Tuning Normally when the shapes of model inputs are known during session creation, the shapes for the rest of the model are inferred by OnnxRuntime when a session is created. However if a model input contains a free dimension (such as for batch size), steps must be taken to retain the above performance benefits. In this case, there are three options: Edit the model to replace an input’s free dimension (specified through ONNX using “dim_param”) with a fixed size (specified through ONNX using “dim_value”). |
Beta Was this translation helpful? Give feedback.
i found this document https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html
Additionally, as the DirectML execution provider does not support parallel execution, it does not support multi-threaded calls to Run on the same inference session. That is, if an inference session using the DirectML execution provider, only one thread may call Run at a time. Multiple threads are permitted to call Run simultaneously if they operate on different inference session objects.
Performance Tuning
The DirectML execution provider works most efficiently when tensor shapes are known at the time a session is created. This provides a few performance benefits: 1) Because constant foldi…