From a75ecf0cf83732763ffdc907211512d3f2913b8e Mon Sep 17 00:00:00 2001 From: Karol Blaszczak Date: Tue, 4 Feb 2025 17:11:32 +0100 Subject: [PATCH] [DOCS] 25.0 final touches --- .../about-openvino/release-notes-openvino.rst | 15 ++++++++------- .../generation/random-uniform-8.rst | 11 ++++++----- 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/docs/articles_en/about-openvino/release-notes-openvino.rst b/docs/articles_en/about-openvino/release-notes-openvino.rst index bb27ff4fabfdb8..2f62df8bfc912d 100644 --- a/docs/articles_en/about-openvino/release-notes-openvino.rst +++ b/docs/articles_en/about-openvino/release-notes-openvino.rst @@ -96,9 +96,9 @@ CPU Device Plugin ----------------------------------------------------------------------------------------------- * Intel® Core™ Ultra 200H processors (formerly code named Arrow Lake-H) are now fully supported. -* Asymmetric 8bit KV Cache cache compression is now enabled on CPU by default, reducing memory +* Asymmetric 8bit KV Cache compression is now enabled on CPU by default, reducing memory usage and memory bandwidth consumption for large language models and improving performance - for 2nd token generation. Asymmetric 4bit KV Cache cache compression on CPU is now supported + for 2nd token generation. Asymmetric 4bit KV Cache compression on CPU is now supported as an option to further reduce memory consumption. * Performance of models running in FP16 on 6th generation of Intel® Xeon® processors with P-core has been enhanced by improving utilization of the underlying AMX FP16 capabilities. @@ -125,7 +125,7 @@ GPU Device Plugin NPU Device Plugin ----------------------------------------------------------------------------------------------- -* Performance has been improved for CW symmetrically quantized LLMs, including Llama2-7B-chat, +* Performance has been improved for Channel-Wise symmetrically quantized LLMs, including Llama2-7B-chat, Llama3-8B-instruct, Qwen-2-7B, Mistral-0.2-7B-Instruct, Phi-3-Mini-4K-Instruct, MiniCPM-1B models. The best performance is achieved using symmetrically-quantized 4-bit (INT4) quantized models. @@ -164,7 +164,7 @@ TensorFlow Framework Support PyTorch Framework Support ----------------------------------------------------------------------------------------------- -* Preview: Introducing NPU support for torch.compile , giving developers the ability to use +* Preview: Introducing NPU support for torch.compile, giving developers the ability to use the OpenVINO backend to run the PyTorch API on NPUs. 300+ deep learning models enabled from the TorchVision, Timm, and TorchBench repositories. * Preview: Support conversion of PyTorch models with AWQ weights compression, enabling models @@ -287,7 +287,7 @@ The following has been added: * Stateful decoder for WhisperPipeline. Whisper decoder models with past are deprecated. * Export a model with new optimum-intel to obtain stateful version. * Performance metrics for WhisperPipeline. - * initial_prompt and hotwords parameters for the Whisper pipeline allowing to guide generation. + * initial_prompt and hotwords parameters for WhisperPipeline allowing to guide generation. * LLMPipeline @@ -331,6 +331,7 @@ Jupyter Notebooks * `RAG using OpenVINO GenAI and LangChain `__ * `LLM chatbot `__ extended with GLM-Edge, Phi4, and Deepseek-R1 distilled models +* `LLM reasoning with DeepSeek-R1 distilled models `__ Known Issues @@ -340,8 +341,8 @@ Known Issues | ID: 160167 | Description: | TensorFlow Object Detection models converted to the IR through the OVC tool gives poor - performance on CPU and GPU devices. As a workaround, please use the MO tool from 2024.6 or - earlier to generate IRs. + performance on CPU, GPU, and NPU devices. As a workaround, please use the MO tool from + 2024.6 or earlier to generate IRs. | **Component: Tokenizers** | ID: 159392 diff --git a/docs/articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/generation/random-uniform-8.rst b/docs/articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/generation/random-uniform-8.rst index 26aad1eb161ace..0f36e6e1e0d35a 100644 --- a/docs/articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/generation/random-uniform-8.rst +++ b/docs/articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/generation/random-uniform-8.rst @@ -15,15 +15,15 @@ RandomUniform **Detailed description**: *RandomUniform* operation generates random numbers from a uniform distribution in the range ``[minval, maxval)``. -The generation algorithm is based on an underlying random integer generator that uses either Philox or Mersnne-Twister algorithm. +The generation algorithm is based on an underlying random integer generator that uses either Philox or Mersnne-Twister algorithm. Both algorithms are counter-based pseudo-random generators, which produce uint32 values. A single algorithm invocation returns four result random values, depending on the given initial values. For Philox, these values are *key* and *counter*, for Mersenne-Twister it is a single *state* value. *Key* and *counter* are initialized -with *global_seed* and *op_seed* attributes respectively, while the *state* is only initialized using *global_seed*. +with *global_seed* and *op_seed* attributes respectively, while the *state* is only initialized using *global_seed*. -Algorithm selection allows to align the output of OpenVINO's Random Uniform op with the ones available in Tensorflow and PyTorch. +Algorithm selection allows to align the output of OpenVINO's Random Uniform op with the ones available in Tensorflow and PyTorch. The *alignment* attribute selects which framework the output should be aligned to. Tensorflow uses the Philox algorithm and PyTorch uses the Mersenne-Twister algorithm. For Tensorflow, this function is equivalent to the function tf.raw_ops.RandomUniform(shape, dtype, global_seed, op_seed) when dtype represents a real number, and tf.raw_ops.RandomUniformInt(shape, min\_val, max\_val, dtype, global\_seed, op\_seed) for integer types. Internally, both of these functions are executed by tf.random.uniform(shape, min\_val, max\_val, dtype, global\_seed, op\_seed), where for floating-point dtype the output goes through additional conversion to reside within a given range. -For PyTorch, this function is equivalent to the function torch.Tensor(shape, dtype).uniform\_(min\_val, max\_val) when dtype represents a real number, and torch.Tensor(shape, dtype).random\_(min\_val, max\_val) for integer types. Internally, both of these functions are executed by torch.rand(shape, dtype) with default generator and layout. The seed of these functions is provided by calling torch.manual\_seed(global\_seed). op\_seed value is ignored. +For PyTorch, this function is equivalent to the function torch.Tensor(shape, dtype).uniform\_(min\_val, max\_val) when dtype represents a real number, and torch.Tensor(shape, dtype).random\_(min\_val, max\_val) for integer types. Internally, both of these functions are executed by torch.rand(shape, dtype) with default generator and layout. The seed of these functions is provided by calling torch.manual\_seed(global\_seed). op\_seed value is ignored. By default, the output is aligned with Tensorflow (Philox algorithm). This behavior is backwards-compatibile. If both seed values are equal to zero, RandomUniform generates a non-deterministic sequence. @@ -257,7 +257,7 @@ Whenever all state values are 'used', a new state array is generated recursively twisted_state = (((current_state & 0x80000000) | (next_state & 0x7fffffff)) >> 1) ^ (next_state & 1 ? 0x9908b0df : 0) state[i] = next_m_state ^ twisted_state -where m is a constant. +where m is a constant. For parity with PyTorch, the value of the constants is set as follows: @@ -328,6 +328,7 @@ In other words: output = output % (max - min) + min Example 1. RandomUniform output with initial_seed = 150, output_type = f32, alignment = PYTORCH: + .. code-block:: xml :force: