openvinotoolkit · kblaszczak-intel · Feb 4, 2025 · Feb 4, 2025
@@ -96,9 +96,9 @@ CPU Device Plugin
 -----------------------------------------------------------------------------------------------
 
 * Intel® Core™ Ultra 200H processors (formerly code named Arrow Lake-H) are now fully supported.
-* Asymmetric 8bit KV Cache cache compression is now enabled on CPU by default, reducing memory
+* Asymmetric 8bit KV Cache compression is now enabled on CPU by default, reducing memory
   usage and memory bandwidth consumption for large language models and improving performance
-  for 2nd token generation. Asymmetric 4bit KV Cache cache compression on CPU is now supported
+  for 2nd token generation. Asymmetric 4bit KV Cache compression on CPU is now supported
   as an option to further reduce memory consumption.
 * Performance of models running in FP16 on 6th generation of Intel® Xeon® processors with P-core
   has been enhanced by improving utilization of the underlying AMX FP16 capabilities.
@@ -125,7 +125,7 @@ GPU Device Plugin
 NPU Device Plugin
 -----------------------------------------------------------------------------------------------
 
-* Performance has been improved for CW symmetrically quantized LLMs, including Llama2-7B-chat,
+* Performance has been improved for Channel-Wise symmetrically quantized LLMs, including Llama2-7B-chat,
   Llama3-8B-instruct, Qwen-2-7B, Mistral-0.2-7B-Instruct, Phi-3-Mini-4K-Instruct, MiniCPM-1B
   models. The best performance is achieved using symmetrically-quantized 4-bit (INT4) quantized
   models.
@@ -164,7 +164,7 @@ TensorFlow Framework Support
 PyTorch Framework Support
 -----------------------------------------------------------------------------------------------
 
-* Preview: Introducing NPU support for torch.compile  , giving developers the ability to use
+* Preview: Introducing NPU support for torch.compile, giving developers the ability to use
   the OpenVINO backend to run the PyTorch API on NPUs. 300+ deep learning models enabled from
   the TorchVision, Timm, and TorchBench repositories.
 * Preview: Support conversion of PyTorch models with AWQ weights compression, enabling models
@@ -287,7 +287,7 @@ The following has been added:
   * Stateful decoder for WhisperPipeline. Whisper decoder models with past are deprecated.
   * Export a model with new optimum-intel to obtain stateful version.
   * Performance metrics for WhisperPipeline.
-  * initial_prompt and hotwords parameters for the Whisper pipeline allowing to guide generation.
+  * initial_prompt and hotwords parameters for WhisperPipeline allowing to guide generation.
 
 * LLMPipeline
 
@@ -331,6 +331,7 @@ Jupyter Notebooks
 * `RAG using OpenVINO GenAI and LangChain <https://openvinotoolkit.github.io/openvino_notebooks/?search=Create+a+RAG+system+using+OpenVINO+GenAI+and+LangChain>`__
 * `LLM chatbot <https://openvinotoolkit.github.io/openvino_notebooks/?search=Create+an+LLM-powered+Chatbot+using+OpenVINO+Generate+API>`__
   extended with GLM-Edge, Phi4, and Deepseek-R1 distilled models
+* `LLM reasoning with DeepSeek-R1 distilled models <https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/deepseek-r1>`__
 
 
 Known Issues
@@ -340,8 +341,8 @@ Known Issues
 | ID: 160167
 | Description:
 |   TensorFlow Object Detection models converted to the IR through the OVC tool gives poor
-    performance on CPU and GPU devices. As a workaround, please use the MO tool from 2024.6 or
-    earlier to generate IRs.
+    performance on CPU, GPU, and NPU devices. As a workaround, please use the MO tool from
+    2024.6 or earlier to generate IRs.
 
 | **Component: Tokenizers**
 | ID: 159392

@@ -15,15 +15,15 @@ RandomUniform
 **Detailed description**:
 
 *RandomUniform* operation generates random numbers from a uniform distribution in the range ``[minval, maxval)``.
-The generation algorithm is based on an underlying random integer generator that uses either Philox or Mersnne-Twister algorithm. 
+The generation algorithm is based on an underlying random integer generator that uses either Philox or Mersnne-Twister algorithm.
 Both algorithms are counter-based pseudo-random generators, which produce uint32 values. A single algorithm invocation returns
 four result random values, depending on the given initial values. For Philox, these values are *key* and *counter*, for Mersenne-Twister it is a single *state* value. *Key* and *counter* are initialized
-with *global_seed* and *op_seed* attributes respectively, while the *state* is only initialized using *global_seed*. 
+with *global_seed* and *op_seed* attributes respectively, while the *state* is only initialized using *global_seed*.
 
-Algorithm selection allows to align the output of OpenVINO's Random Uniform op with the ones available in Tensorflow and PyTorch. 
+Algorithm selection allows to align the output of OpenVINO's Random Uniform op with the ones available in Tensorflow and PyTorch.
 The *alignment* attribute selects which framework the output should be aligned to. Tensorflow uses the Philox algorithm and PyTorch uses the Mersenne-Twister algorithm.
 For Tensorflow, this function is equivalent to the function tf.raw_ops.RandomUniform(shape, dtype, global_seed, op_seed) when dtype represents a real number, and tf.raw_ops.RandomUniformInt(shape, min\_val, max\_val, dtype, global\_seed, op\_seed) for integer types. Internally, both of these functions are executed by tf.random.uniform(shape, min\_val, max\_val, dtype, global\_seed, op\_seed), where for floating-point dtype the output goes through additional conversion to reside within a given range.
-For PyTorch, this function is equivalent to the function torch.Tensor(shape, dtype).uniform\_(min\_val, max\_val) when dtype represents a real number, and torch.Tensor(shape, dtype).random\_(min\_val, max\_val) for integer types. Internally, both of these functions are executed by torch.rand(shape, dtype) with default generator and layout. The seed of these functions is provided by calling torch.manual\_seed(global\_seed). op\_seed value is ignored. 
+For PyTorch, this function is equivalent to the function torch.Tensor(shape, dtype).uniform\_(min\_val, max\_val) when dtype represents a real number, and torch.Tensor(shape, dtype).random\_(min\_val, max\_val) for integer types. Internally, both of these functions are executed by torch.rand(shape, dtype) with default generator and layout. The seed of these functions is provided by calling torch.manual\_seed(global\_seed). op\_seed value is ignored.
 By default, the output is aligned with Tensorflow (Philox algorithm). This behavior is backwards-compatibile.
 
 If both seed values are equal to zero, RandomUniform generates a non-deterministic sequence.
@@ -257,7 +257,7 @@ Whenever all state values are 'used', a new state array is generated recursively
    twisted_state = (((current_state & 0x80000000) | (next_state & 0x7fffffff)) >> 1) ^ (next_state & 1 ? 0x9908b0df : 0)
    state[i] = next_m_state ^ twisted_state
 
-where m is a constant. 
+where m is a constant.
 
 For parity with PyTorch, the value of the constants is set as follows:
 
@@ -328,6 +328,7 @@ In other words:
    output = output % (max - min) + min
 
 Example 1. RandomUniform output with initial_seed = 150, output_type = f32, alignment = PYTORCH:
+
 .. code-block:: xml
    :force: