Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] 25.0 final touches #28819

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions docs/articles_en/about-openvino/release-notes-openvino.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,9 @@ CPU Device Plugin
-----------------------------------------------------------------------------------------------

* Intel® Core™ Ultra 200H processors (formerly code named Arrow Lake-H) are now fully supported.
* Asymmetric 8bit KV Cache cache compression is now enabled on CPU by default, reducing memory
* Asymmetric 8bit KV Cache compression is now enabled on CPU by default, reducing memory
usage and memory bandwidth consumption for large language models and improving performance
for 2nd token generation. Asymmetric 4bit KV Cache cache compression on CPU is now supported
for 2nd token generation. Asymmetric 4bit KV Cache compression on CPU is now supported
as an option to further reduce memory consumption.
* Performance of models running in FP16 on 6th generation of Intel® Xeon® processors with P-core
has been enhanced by improving utilization of the underlying AMX FP16 capabilities.
Expand All @@ -125,7 +125,7 @@ GPU Device Plugin
NPU Device Plugin
-----------------------------------------------------------------------------------------------

* Performance has been improved for CW symmetrically quantized LLMs, including Llama2-7B-chat,
* Performance has been improved for Channel-Wise symmetrically quantized LLMs, including Llama2-7B-chat,
Llama3-8B-instruct, Qwen-2-7B, Mistral-0.2-7B-Instruct, Phi-3-Mini-4K-Instruct, MiniCPM-1B
models. The best performance is achieved using symmetrically-quantized 4-bit (INT4) quantized
models.
Expand Down Expand Up @@ -164,7 +164,7 @@ TensorFlow Framework Support
PyTorch Framework Support
-----------------------------------------------------------------------------------------------

* Preview: Introducing NPU support for torch.compile , giving developers the ability to use
* Preview: Introducing NPU support for torch.compile, giving developers the ability to use
the OpenVINO backend to run the PyTorch API on NPUs. 300+ deep learning models enabled from
the TorchVision, Timm, and TorchBench repositories.
* Preview: Support conversion of PyTorch models with AWQ weights compression, enabling models
Expand Down Expand Up @@ -287,7 +287,7 @@ The following has been added:
* Stateful decoder for WhisperPipeline. Whisper decoder models with past are deprecated.
* Export a model with new optimum-intel to obtain stateful version.
* Performance metrics for WhisperPipeline.
* initial_prompt and hotwords parameters for the Whisper pipeline allowing to guide generation.
* initial_prompt and hotwords parameters for WhisperPipeline allowing to guide generation.

* LLMPipeline

Expand Down Expand Up @@ -331,6 +331,7 @@ Jupyter Notebooks
* `RAG using OpenVINO GenAI and LangChain <https://openvinotoolkit.github.io/openvino_notebooks/?search=Create+a+RAG+system+using+OpenVINO+GenAI+and+LangChain>`__
* `LLM chatbot <https://openvinotoolkit.github.io/openvino_notebooks/?search=Create+an+LLM-powered+Chatbot+using+OpenVINO+Generate+API>`__
extended with GLM-Edge, Phi4, and Deepseek-R1 distilled models
* `LLM reasoning with DeepSeek-R1 distilled models <https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/deepseek-r1>`__


Known Issues
Expand All @@ -340,8 +341,8 @@ Known Issues
| ID: 160167
| Description:
| TensorFlow Object Detection models converted to the IR through the OVC tool gives poor
performance on CPU and GPU devices. As a workaround, please use the MO tool from 2024.6 or
earlier to generate IRs.
performance on CPU, GPU, and NPU devices. As a workaround, please use the MO tool from
2024.6 or earlier to generate IRs.

| **Component: Tokenizers**
| ID: 159392
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ RandomUniform
**Detailed description**:

*RandomUniform* operation generates random numbers from a uniform distribution in the range ``[minval, maxval)``.
The generation algorithm is based on an underlying random integer generator that uses either Philox or Mersnne-Twister algorithm.
The generation algorithm is based on an underlying random integer generator that uses either Philox or Mersnne-Twister algorithm.
Both algorithms are counter-based pseudo-random generators, which produce uint32 values. A single algorithm invocation returns
four result random values, depending on the given initial values. For Philox, these values are *key* and *counter*, for Mersenne-Twister it is a single *state* value. *Key* and *counter* are initialized
with *global_seed* and *op_seed* attributes respectively, while the *state* is only initialized using *global_seed*.
with *global_seed* and *op_seed* attributes respectively, while the *state* is only initialized using *global_seed*.

Algorithm selection allows to align the output of OpenVINO's Random Uniform op with the ones available in Tensorflow and PyTorch.
Algorithm selection allows to align the output of OpenVINO's Random Uniform op with the ones available in Tensorflow and PyTorch.
The *alignment* attribute selects which framework the output should be aligned to. Tensorflow uses the Philox algorithm and PyTorch uses the Mersenne-Twister algorithm.
For Tensorflow, this function is equivalent to the function tf.raw_ops.RandomUniform(shape, dtype, global_seed, op_seed) when dtype represents a real number, and tf.raw_ops.RandomUniformInt(shape, min\_val, max\_val, dtype, global\_seed, op\_seed) for integer types. Internally, both of these functions are executed by tf.random.uniform(shape, min\_val, max\_val, dtype, global\_seed, op\_seed), where for floating-point dtype the output goes through additional conversion to reside within a given range.
For PyTorch, this function is equivalent to the function torch.Tensor(shape, dtype).uniform\_(min\_val, max\_val) when dtype represents a real number, and torch.Tensor(shape, dtype).random\_(min\_val, max\_val) for integer types. Internally, both of these functions are executed by torch.rand(shape, dtype) with default generator and layout. The seed of these functions is provided by calling torch.manual\_seed(global\_seed). op\_seed value is ignored.
For PyTorch, this function is equivalent to the function torch.Tensor(shape, dtype).uniform\_(min\_val, max\_val) when dtype represents a real number, and torch.Tensor(shape, dtype).random\_(min\_val, max\_val) for integer types. Internally, both of these functions are executed by torch.rand(shape, dtype) with default generator and layout. The seed of these functions is provided by calling torch.manual\_seed(global\_seed). op\_seed value is ignored.
By default, the output is aligned with Tensorflow (Philox algorithm). This behavior is backwards-compatibile.

If both seed values are equal to zero, RandomUniform generates a non-deterministic sequence.
Expand Down Expand Up @@ -257,7 +257,7 @@ Whenever all state values are 'used', a new state array is generated recursively
twisted_state = (((current_state & 0x80000000) | (next_state & 0x7fffffff)) >> 1) ^ (next_state & 1 ? 0x9908b0df : 0)
state[i] = next_m_state ^ twisted_state

where m is a constant.
where m is a constant.

For parity with PyTorch, the value of the constants is set as follows:

Expand Down Expand Up @@ -328,6 +328,7 @@ In other words:
output = output % (max - min) + min

Example 1. RandomUniform output with initial_seed = 150, output_type = f32, alignment = PYTORCH:

.. code-block:: xml
:force:

Expand Down
Loading