Skip to content

Commit

Permalink
[DOCS] 25.0 polishing 25.0
Browse files Browse the repository at this point in the history
  • Loading branch information
kblaszczak-intel committed Feb 4, 2025
1 parent c365229 commit 2923489
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 34 deletions.
71 changes: 38 additions & 33 deletions docs/articles_en/about-openvino/release-notes-openvino.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ What's new

* More GenAI coverage and framework integrations to minimize code changes.

* New models supported: Qwen 2.5.
* New models supported: Qwen 2.5, Deepseek-R1-Distill-Llama-8B, DeepSeek-R1-Distill-Qwen-7B,
and DeepSeek-R1-Distill-Qwen-1.5B, FLUX.1 Schnell and FLUX.1 Dev.
* Whisper Model: Improved performance on CPUs, built-in GPUs, and discrete GPUs with GenAI API.
* Preview: Introducing NPU support for torch.compile, giving developers the ability to use the
OpenVINO backend to run the PyTorch API on NPUs. 300+ deep learning models enabled from the
Expand All @@ -38,30 +39,34 @@ What's new

* Preview: Addition of Prompt Lookup to GenAI API improves 2nd token latency for LLMs by
effectively utilizing predefined prompts that match the intended use case.
* Preview: The GenAI API now offers image-to-image inpainting functionality. This feature
enables models to generate realistic content by inpainting specified modifications and
seamlessly integrating them with the original image.
* Asymmetric KV Cache compression is now enabled for INT8 on CPUs, resulting in lower
memory consumption and improved 2nd token latency, especially when dealing with long prompts
that require significant memory. The option should be explicitly specified by the user.

* More portability and performance to run AI at the edge, in the cloud, or locally.

* Support for the latest Intel® Core™ Ultra 200H series processors (formerly codenamed Arrow
Lake-H)
* Preview: The GenAI API now offers image-to-image inpainting functionality. This feature
enables models to generate realistic content by inpainting specified modifications and
seamlessly integrating them with the original image.
* Integration of the OpenVINO backend with the Triton Inference Server allows developers to
* Support for the latest Intel® Core™ Ultra 200H series processors (formerly codenamed
Arrow Lake-H)
* Integration of the OpenVINO ™ backend with the Triton Inference Server allows developers to
utilize the Triton server for enhanced model serving performance when deploying on Intel
CPUs.
* Preview: A new OpenVINO backend integration allows developers to leverage OpenVINO
performance optimizations directly within Keras 3 workflows for faster AI inference on
Intel® CPUs, built-in GPUs, discrete GPUs, and NPUs. This feature is available with the
latest Keras 3.8 release.
* Preview: A new OpenVINO ™ backend integration allows developers to leverage OpenVINO
performance optimizations directly within Keras 3 workflows for faster AI inference on CPUs,
built-in GPUs, discrete GPUs, and NPUs. This feature is available with the latest Keras 3.8
release.
* The OpenVINO Model Server now supports native Windows Server deployments, allowing
developers to leverage better performance by eliminating container overhead and simplifying
GPU deployment.



Now Deprecated
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

* Legacy prefixes (l_,w_,m_) have been removed from OpenVINO archive names.
* Legacy prefixes `l_`, `w_`, and `m_` have been removed from OpenVINO archive names.
* The `runtime` namespace for Python API has been marked as deprecated and designated to be
removed for 2026.0. The new namespace structure has been delivered, and migration is possible
immediately. Details will be communicated through warnings and via documentation.
Expand Down Expand Up @@ -91,9 +96,9 @@ CPU Device Plugin
-----------------------------------------------------------------------------------------------

* Intel® Core™ Ultra 200H processors (formerly code named Arrow Lake-H) are now fully supported.
* Asymmetric 8bit key-value cache compression is now enabled on CPU by default, reducing memory
* Asymmetric 8bit KV Cache cache compression is now enabled on CPU by default, reducing memory
usage and memory bandwidth consumption for large language models and improving performance
for 2nd token generation. Asymmetric 4bit key-value cache compression on CPU is now supported
for 2nd token generation. Asymmetric 4bit KV Cache cache compression on CPU is now supported
as an option to further reduce memory consumption.
* Performance of models running in FP16 on 6th generation of Intel® Xeon® processors with P-core
has been enhanced by improving utilization of the underlying AMX FP16 capabilities.
Expand All @@ -112,18 +117,19 @@ GPU Device Plugin
OpenVINO GenAI APIs with continuous batching and SDPA-based LLMs with long prompts (>4k).
* Stateful models are now enabled, significantly improving performance of Whisper models on all
GPU platforms.
* Stable Diffusion 3 and Flux.1 performance has been improved.
* Stable Diffusion 3 and FLUX.1 performance has been improved.
* The issue of a black image output for image generation models, including SDXL, SD3, and
Flux.1, with FP16 precision has been solved.
FLUX.1, with FP16 precision has been solved.


NPU Device Plugin
-----------------------------------------------------------------------------------------------

* Performance has been improved for Channel-Wise symmetrically quantized LLMs, including
Llama2-7B-chat, Llama3-8B-instruct, qwen-2-7B, Mistral-0.2-7B-instruct, phi-3-mini-4K-instruct,
miniCPM-1B models. The best performance is achieved using fp16-in4 quantized models.
* Preview: Introducing NPU support for torch.compile, giving developers the ability to use the
* Performance has been improved for CW symmetrically quantized LLMs, including Llama2-7B-chat,
Llama3-8B-instruct, Qwen-2-7B, Mistral-0.2-7B-Instruct, Phi-3-Mini-4K-Instruct, MiniCPM-1B
models. The best performance is achieved using symmetrically-quantized 4-bit (INT4) quantized
models.
* Preview: Introducing NPU support for torch.compile, giving developers the ability to use the
OpenVINO backend to run the PyTorch API on NPUs. 300+ deep learning models enabled from
the TorchVision, Timm, and TorchBench repositories.

Expand Down Expand Up @@ -187,9 +193,6 @@ ONNX Framework Support
-----------------------------------------------------------------------------------------------

* Runtime memory consumption for models with quantized weight has been reduced.
* Models from the com.microsoft domain that use the following operations are now enabled:
SkipSimplifiedLayerNormalization, SimplifiedLayerNormalization, FusedMatMul, QLinearSigmoid,
QLinearLeakyRelu, QLinearAdd, QLinearMul, Range, DynamicQuantizeMatMul, MatMulIntegerToFloat.
* Workflow which affected reading of 2 bytes data types has been fixed.


Expand All @@ -205,7 +208,7 @@ OpenVINO Model Server
* Generative endpoints are fully supported, including text generation and embeddings based on
the OpenAI API, and reranking based on the Cohere API.
* Functional parity with the Linux version is available with minor differences.
* The feature is targeted at client machines with Windows 11 and Data Center environment
* The feature is targeted at client machines with Windows 11 and data center environment
with Windows 2022 Server OS.
* Demos have been updated to work on both Linux and Windows. Check the
`installation guide <https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_docs_deploying_server_baremetal.html>`__
Expand Down Expand Up @@ -284,7 +287,7 @@ The following has been added:
* Stateful decoder for WhisperPipeline. Whisper decoder models with past are deprecated.
* Export a model with new optimum-intel to obtain stateful version.
* Performance metrics for WhisperPipeline.
* initial_prompt and hotwords parameters for whisper pipeline allowing to guide generation.
* initial_prompt and hotwords parameters for the Whisper pipeline allowing to guide generation.

* LLMPipeline

Expand All @@ -297,10 +300,9 @@ The following has been added:
* rng_seed parameter to ImageGenerationConfig.
* Callback for image generation pipelines allowing to track generation progress and obtain
intermediate results.
* EulerAncestralDiscreteScheduler - SDXL turbo.
* PNDMScheduler – Stable Diffusion 1.x and 2.x.
* Models: black-forest-labs/FLUX.1-schnell, Freepik/flux.1-lite-8B-alpha,
black-forest-labs/FLUX.1-dev shuttleai/shuttle-3-diffusion.
* EulerAncestralDiscreteScheduler for SDXL turbo.
* PNDMScheduler for Stable Diffusion 1.x and 2.x.
* Models: FLUX.1-Schnell, Flux.1-Lite-8B-Alpha, FLUX.1-Dev, and Shuttle-3-Diffusion.
* T5 encoder for SD3 Pipeline.

* VLMPipeline
Expand Down Expand Up @@ -351,18 +353,21 @@ Known Issues
| ID: 161336
| Description:
| Compilation of an openvino model performing weight quantization fails with Segmentation
Fault on LNL. The following workaround can be applied to make it work with existing OV
versions (including 25.0 RCs) before application run: export DNNL_MAX_CPU_ISA=AVX2_VNNI.
Fault on Intel® Core™ Ultra 200V processors. The following workaround can be applied to
make it work with existing OV versions (including 25.0 RCs) before application run:
export DNNL_MAX_CPU_ISA=AVX2_VNNI.
| **Component: GPU Plugin**
| ID: 160802
| Description:
| mllama model crashes on LNL. Please use OpenVINO 2024.6 or earlier to run the model.
| mllama model crashes on Intel® Core™ Ultra 200V processors. Please use OpenVINO 2024.6 or
earlier to run the model.
| **Component: GPU Plugin**
| ID: 160948
| Description:
| Several models have accuracy degradation on LNL, ACM, and BMG. Please use OpenVINO 2024.6
| Several models have accuracy degradation on Intel® Core™ Ultra 200V processors,
Intel® Arc™ A-Series Graphics, and Intel® Arc™ B-Series Graphics. Please use OpenVINO 2024.6
to run the models. Model list: Denoise, Sharpen-Sharpen, fastseg-small, hbonet-0.5,
modnet_photographic_portrait_matting, modnet_webcam_portrait_matting,
mobilenet-v3-small-1.0-224, nasnet-a-mobile-224, yolo_v4, yolo_v5m, yolo_v5s, yolo_v8n,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Additional Configurations
For GPU <configurations/configurations-intel-gpu>
For NPU <configurations/configurations-intel-npu>
GenAI Dependencies <configurations/genai-dependencies>
Troubleshooting Guide for OpenVINO™ Installation & Configuration <troubleshooting-install-config.html>
Troubleshooting Guide for OpenVINO™ Installation & Configuration <configurations/troubleshooting-install-config>

For certain use cases, you may need to install additional software, to benefit from the full
potential of OpenVINO™. Check the following list for components used in your workflow:
Expand Down

0 comments on commit 2923489

Please sign in to comment.