ONNX Runtime v1.1.0
Key Updates
- Performance improvements to accelerate BERT model inference latency on both GPU and CPU. Updates include:
- Additional fused CPU kernels as well as related transformers for key operators such as Attention, EmbedLayerNormalization, SkipLayerNormalization, FastGelu
- Further optimization such as parallelizing Gelu and LayerNorm, enabling legacy stream mode, improving performance of elementwise operators, and fusing add bias into SkipLayerNormalization and FastGelu
- Extended CUDA support for opset 11
- Performance improvement for Faster R-CNN and Master R-CNN with new and updated implementation of opset 11 CUDA kernels, including Resize, Expand, Scatter, and Pad
- TensorRT Execution Provider updates, including support for inputs with dynamic shapes
- MKL-DNN (renamed DNNL) updated to v1.1
- [Preview] NN API Execution Provider for Android - see more
- [Preview] Java API for ONNX Runtime - see more
- Tool for Python API: Automatically maps a dataframe to the inputs of an ONNX graph based on schema information in the pandas frame
- Custom ops can be loaded from shared libraries: Custom ops can now be packaged in shared libraries and distributed for use in multiple applications without modification.
Contributions
We'd like to thank our community members across various teams at Microsoft and other companies for all the valuable contributions.
We'd like to extend special recognition to these individuals for their contributions in this release: Jianhao Zhang (JD AI), Adam Pocock (Oracle), nihui (Tencent), and Nick Groszewski. From the Intel teams, we'd like to thank Patrick Foley, Akhila Vidiyala, Ilya Lavrenov, Manohar Karlapalem, Surya Siddharth Pemmaraju, Sreekanth Yalachigere, Michal Karzynski, Thomas V Trimeloni, Tomasz Dolbniak, Amy Zhuang, Scott Cyphers, Alexander Slepko and other team members on their valuable work to support the Intel Execution Providers for ONNX Runtime.