Merge branch 'master' into ov_file_path_util_file_size

openvinotoolkit · Feb 3, 2025 · 99a18fb · 99a18fb
2 parents afacbb0 + 4c01a98
commit 99a18fb
Show file tree

Hide file tree

Showing 42 changed files with 882 additions and 891 deletions.
diff --git a/.github/workflows/windows_vs2019_release.yml b/.github/workflows/windows_vs2019_release.yml
@@ -45,7 +45,7 @@ jobs:
           repo_token: ${{ secrets.GITHUB_TOKEN }}
           skip_when_only_listed_labels_set: 'docs'
           skip_when_only_listed_files_changed: '*.md,*.rst,*.png,*.jpg,*.svg,*/layer_tests_summary/*,*/conformance/*'
-          
+
       - name: Get target branch
         id: set_target_branch
         run: |
@@ -192,7 +192,7 @@ jobs:
           sparse-checkout: |
             src/bindings/js
           path: 'openvino'
-        
+
       - name: Download OpenVINO artifacts (JS)
         uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
         with:
@@ -223,7 +223,7 @@ jobs:
         run: call npm test
 
       - name: Add msbuild to PATH
-        uses: microsoft/setup-msbuild@v2
+        uses: microsoft/setup-msbuild@6fb02220983dee41ce7ae257b6f4d8f9bf5ed4ce # v2
 
       - name: E2E of openvino-node package
         working-directory: ${{ env.OPENVINO_JS_DIR }}/node

diff --git a/.github/workflows/workflows_scans.yml b/.github/workflows/workflows_scans.yml
@@ -18,6 +18,37 @@ concurrency:
 permissions: read-all
 
 jobs:
+  codeql:
+    name: github_actions_workflows_scan/codeql
+    # Runner size impacts CodeQL analysis time. To learn more, please see:
+    #   - https://gh.io/recommended-hardware-resources-for-running-codeql
+    #   - https://gh.io/supported-runners-and-hardware-resources
+    #   - https://gh.io/using-larger-runners
+    # Consider using larger runners for possible analysis time improvements.
+    runs-on: ubuntu-22.04
+    timeout-minutes: 60
+    permissions:
+      security-events: write
+    steps:
+      - name: Checkout
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+        timeout-minutes: 15
+        with:
+          submodules: 'false'
+          sparse-checkout: .github/workflows
+
+      # Initializes the CodeQL tools for scanning.
+      - name: Initialize CodeQL
+        uses: github/codeql-action/init@df409f7d9260372bd5f19e5b04e83cb3c43714ae # v3.27.9
+        with:
+          languages: "actions"
+          build-mode: "none"
+
+      - name: Perform CodeQL Analysis
+        uses: github/codeql-action/analyze@df409f7d9260372bd5f19e5b04e83cb3c43714ae # v3.27.9
+        with:
+          category: "/language:actions"
+
   semgrep:
     name: github_actions_workflows_scan/semgrep
     runs-on: ubuntu-latest

diff --git a/docs/articles_en/about-openvino/release-notes-openvino.rst b/docs/articles_en/about-openvino/release-notes-openvino.rst
@@ -105,7 +105,7 @@ Deprecation And Support
 Using deprecated features and components is not advised. They are available to enable a smooth
 transition to new solutions and will be discontinued in the future. To keep using discontinued
 features, you will have to revert to the last LTS OpenVINO version supporting them.
-For more details, refer to the `OpenVINO Legacy Features and Components <https://docs.openvino.ai/2024/documentation/legacy-features.html>__`
+For more details, refer to the `OpenVINO Legacy Features and Components <https://docs.openvino.ai/2025/documentation/legacy-features.html>__`
 page.
 
 

diff --git a/docs/articles_en/get-started/learn-openvino/openvino-samples/get-started-demos.rst b/docs/articles_en/get-started/learn-openvino/openvino-samples/get-started-demos.rst
@@ -262,7 +262,7 @@ You need a model that is specific for your inference task. You can get it from o
 Convert the Model
 --------------------
 
-If Your model requires conversion, check the `article <https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/get-started-demos.html>`__ for information how to do it.
+If Your model requires conversion, check the :doc:`article <../../../openvino-workflow/model-preparation>` for information how to do it.
 
 .. _download-media:
 

diff --git a/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst b/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst
@@ -5,8 +5,9 @@ LLM Weight Compression
    :maxdepth: 1
    :hidden:
 
-   weight-compression/microscaling-quantization
    weight-compression/4-bit-weight-quantization
+   weight-compression/microscaling-quantization
+
 
 
 Weight compression enhances the efficiency of models by reducing their memory footprint,
@@ -16,14 +17,13 @@ Unlike full model quantization, where both weights and activations are quantized
 only targets weights, keeping activations as floating-point numbers. This means preserving most
 of the model's accuracy while improving its
 speed and reducing its size. The reduction in size is especially noticeable with larger models.
-For instance the 7 billion parameter Llama 2 model can be reduced
-from about 25GB to 4GB using 4-bit weight compression.
+For instance the 8 billion parameter Llama 3 model can be reduced
+from about 16.1 GB to 4.8 GB using 4-bit weight quantization on top of bfloat16 model.
 
 .. note::
 
-   With smaller language models (i.e. less than 1B parameters), weight
+   With smaller language models (i.e. less than 1B parameters), low-bit weight
    compression may result in more accuracy reduction than with larger models.
-   Therefore, weight compression is recommended for use with LLMs only.
 
 LLMs and other GenAI models that require
 extensive memory to store the weights during inference can benefit
@@ -36,7 +36,7 @@ from weight compression as it:
 * improves inference speed by reducing the latency of memory access when computing the
   operations with weights, for example, Linear layers. The weights are smaller and thus
   faster to load from memory;
-* unlike quantization, does not require sample data to calibrate the range of
+* unlike full static quantization, does not require sample data to calibrate the range of
   activation values.
 
 Currently, `NNCF <https://github.com/openvinotoolkit/nncf>`__
@@ -64,7 +64,7 @@ by running the following command:
    pip install optimum[openvino]
 
 **8-bit weight quantization** offers a good balance between reducing the size and lowering the
-accuracy of a model. It usually results in significant improvements for transformer-based models
+accuracy of a model. It usually results in significant improvements for Transformer-based models
 and guarantees good model performance for a vast majority of supported CPU and GPU platforms.
 By default, weights are compressed asymmetrically to "INT8_ASYM" mode.
 
@@ -223,17 +223,6 @@ depending on the model.
       For more details, refer to the article on how to
       :doc:`infer LLMs using Optimum Intel <../../../openvino-workflow-generative/inference-with-optimum-intel>`.
 
-The code snippet below shows how to do 4-bit quantization of the model weights represented
-in OpenVINO IR using NNCF:
-
-.. tab-set::
-
-   .. tab-item:: OpenVINO
-      :sync: openvino
-
-      .. doxygensnippet:: docs/optimization_guide/nncf/code/weight_compression_openvino.py
-         :language: python
-         :fragment: [compression_4bit]
 
 Refer to the article about
 :doc:`4-bit weight quantization <./weight-compression/4-bit-weight-quantization>`

diff --git a/...kflow/model-optimization-guide/weight-compression/4-bit-weight-quantization.rst b/...kflow/model-optimization-guide/weight-compression/4-bit-weight-quantization.rst
@@ -133,7 +133,12 @@ trade-offs after optimization:
   There are three modes: INT8_ASYM, INT8_SYM, and NONE, which retains
   the original floating-point precision of the model weights (``INT8_ASYM`` is default value).
 
-|
+
+
+.. tip::
+
+   NNCF allows stacking the supported optimization methods. For example, AWQ, Scale Estimation
+   and GPTQ methods can be enabled all together to achieve better accuracy.
 
 4-bit Weight Quantization with GPTQ
 ###################################

diff --git a/docs/articles_en/openvino-workflow/model-optimization.rst b/docs/articles_en/openvino-workflow/model-optimization.rst
@@ -21,24 +21,24 @@ In OpenVINO, the default optimization tool is NNCF (Neural Network Compression F
 It is a `set of compression algorithms <https://github.com/openvinotoolkit/nncf/blob/develop/README.md>`__,
 organized as a Python package, that make your models smaller and faster. Note that NNCF
 is **not part of the OpenVINO package**, so it needs to be installed separately. It supports
-models in **PyTorch**, **TensorFlow** , **ONNX**, and **OpenVINO IR** formats, offering
+models in **OpenVINO IR**, **PyTorch**, **ONNX**, and **TensorFlow** formats, offering
 the following main optimizations:
 
 .. image:: ../assets/images/WHAT_TO_USE.svg
 
 
 | :doc:`Weight Compression <model-optimization-guide/weight-compression>`:
-|      an easy-to-use method for Large Language Model footprint reduction and inference
+|      An easy-to-use method for Large Language Model footprint reduction and inference
        acceleration.
 
 | :doc:`Post-training Quantization <model-optimization-guide/quantizing-models-post-training>`:
-|      designed to optimize deep learning models by applying 8-bit integer quantization. Being
+|      Designed to optimize deep learning models by applying 8-bit integer quantization. Being
        the easiest way to optimize a model it does not require its retraining or fine-tuning
        but may result in a drop in accuracy. If the accuracy-performance tradeoff is not
        acceptable, Training-time Optimization may be a better option.
 
 | :doc:`Training-time Optimization <model-optimization-guide/compressing-models-during-training>`:
-|      involves a suite of advanced methods such as Structured or Unstructured Pruning, as well
+|      Involves a suite of advanced methods such as Structured or Unstructured Pruning, as well
        as Quantization-aware Training. This kind of optimization requires the use of the model's
        original framework, for NNCF, it is either PyTorch or TensorFlow.
 
@@ -54,13 +54,7 @@ Recommended workflows
   3. If the accuracy drop is unacceptable, use quantization-aware training instead. It will give
      you the same level of performance boost, with a smaller impact on accuracy.
 
-* **Weight compression** works **only with LLMs**. Do not try to use it with other models.
-* For **visual-multimodal** use cases, the encoder / decoder split approach may be recommended.
-
-
-
-
-
+* **Weight compression** works with **LLMs**, **VLMs** and other Transformer-based models.
 
 
 

diff --git a/docs/notebooks/convert-to-openvino-with-output.rst b/docs/notebooks/convert-to-openvino-with-output.rst
@@ -54,7 +54,7 @@ OpenVINO IR format
 
 
 OpenVINO `Intermediate Representation
-(IR) <https://docs.openvino.ai/2024/documentation/openvino-ir-format.html>`__
+(IR) <https://docs.openvino.ai/2025/documentation/openvino-ir-format.html>`__
 is the proprietary model format of OpenVINO. It is produced after
 converting a model with model conversion API. Model conversion API
 translates the frequently used deep learning operations to their

diff --git a/docs/notebooks/cross-lingual-books-alignment-with-output.rst b/docs/notebooks/cross-lingual-books-alignment-with-output.rst
@@ -941,7 +941,7 @@ advance and fill it in as the inference requests are executed.
 Let’s compare the models and plot the results.
 
    **Note**: To get a more accurate benchmark, use the `Benchmark Python
-   Tool <https://docs.openvino.ai/2024/get-started/learn-openvino/openvino-samples/benchmark-tool.html>`__
+   Tool <https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/benchmark-tool.html>`__
 
 .. code:: ipython3
 

diff --git a/docs/notebooks/ct-segmentation-quantize-nncf-with-output.rst b/docs/notebooks/ct-segmentation-quantize-nncf-with-output.rst
@@ -623,7 +623,7 @@ Compare Performance of the FP32 IR Model and Quantized Models
 
 To measure the inference performance of the ``FP32`` and ``INT8``
 models, we use `Benchmark
-Tool <https://docs.openvino.ai/2024/get-started/learn-openvino/openvino-samples/benchmark-tool.html>`__
+Tool <https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/benchmark-tool.html>`__
 - OpenVINO’s inference performance measurement tool. Benchmark tool is a
 command line application, part of OpenVINO development tools, that can
 be run in the notebook with ``! benchmark_app`` or

diff --git a/docs/notebooks/ddcolor-image-colorization-with-output.rst b/docs/notebooks/ddcolor-image-colorization-with-output.rst
@@ -499,7 +499,7 @@ Compare inference time of the FP16 and INT8 models
 
 To measure the inference performance of OpenVINO FP16 and INT8 models,
 use `Benchmark
-Tool <https://docs.openvino.ai/2024/get-started/learn-openvino/openvino-samples/benchmark-tool.html>`__.
+Tool <https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/benchmark-tool.html>`__.
 
    **NOTE**: For the most accurate performance estimation, it is
    recommended to run ``benchmark_app`` in a terminal/command prompt

diff --git a/docs/notebooks/depth-anything-v2-with-output.rst b/docs/notebooks/depth-anything-v2-with-output.rst
@@ -977,7 +977,7 @@ Compare inference time of the FP16 and INT8 models
 
 To measure the inference performance of OpenVINO FP16 and INT8 models,
 use `Benchmark
-Tool <https://docs.openvino.ai/2024/get-started/learn-openvino/openvino-samples/benchmark-tool.html>`__.
+Tool <https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/benchmark-tool.html>`__.
 
    **NOTE**: For the most accurate performance estimation, it is
    recommended to run ``benchmark_app`` in a terminal/command prompt

diff --git a/docs/notebooks/depth-anything-with-output.rst b/docs/notebooks/depth-anything-with-output.rst
@@ -940,7 +940,7 @@ Compare inference time of the FP16 and INT8 models
 
 To measure the inference performance of OpenVINO FP16 and INT8 models,
 use `Benchmark
-Tool <https://docs.openvino.ai/2024/get-started/learn-openvino/openvino-samples/benchmark-tool.html>`__.
+Tool <https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/benchmark-tool.html>`__.
 
    **NOTE**: For the most accurate performance estimation, it is
    recommended to run ``benchmark_app`` in a terminal/command prompt