[ET-VK] Introduce generic export pass for fusing Q/DQ nodes #10525

SS-JIA · 2025-04-28T19:41:17Z

Stack from ghstack (oldest at bottom):

-> [ET-VK] Introduce generic export pass for fusing Q/DQ nodes #10525

Context

When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as aten.linear.default to produce nodes corresponding to quantized operators (e.g. weight_int8pack_mm) in order for quantized operator implementations to be called at runtime.

Currently, the op fusion is done by the fuse_dequant_linear.py pass, however, this only handles one specific fusion pattern to generate a weight_int8pack_mm operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns.

Changes

Introduce the FuseQuantizedOpsTransform() pass. I elected to introduce a new pass under the backends/vulkan/_passes directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK.

Remove the existing FuseDequantLinearPass()

Switch to using the FuseQuantizedOpsTransform pass instead of the old FuseDequantLinear pass.

Add test_vulkan_passes Python test to test export passes.

Some small refactors to test_vulkan_delegate Python test to improve code organizations.

Differential Revision: D73794042

## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Some small refactors to `test_vulkan_delegate` Python test to improve code organizations. Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/) [ghstack-poisoned]

## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Some small refactors to `test_vulkan_delegate` Python test to improve code organizations. Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/) ghstack-source-id: 280746102 Pull Request resolved: #10525

pytorch-bot · 2025-04-28T19:41:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10525

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit c55ef19 with merge base df75088 ():

NEW FAILURES - The following jobs have failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels
Lint / lintrunner / linux-job (gh)
>>> Lint for backends/cadence/hifi/operators/op_mm.cpp:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-04-28T19:41:29Z

This pull request was exported from Phabricator. Differential Revision: D73794042

github-actions · 2025-04-28T19:42:08Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

kimishpatel · 2025-04-28T23:30:33Z

Why are you relyin on weight_int8pack_mm at all? That is not a public api op as it precedes with _. If it is removed your passes here will fail. What you really want is just a fused pattern recognition. Can you directly not recognize that? or you need to serialize some "fake" op that you have lowering for at runtime?

SS-JIA requested a review from kimishpatel as a code owner April 28, 2025 19:41

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2025

facebook-github-bot added the fb-exported label Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK] Introduce generic export pass for fusing Q/DQ nodes #10525

[ET-VK] Introduce generic export pass for fusing Q/DQ nodes #10525

SS-JIA commented Apr 28, 2025 •

edited

Loading

pytorch-bot bot commented Apr 28, 2025 •

edited

Loading

facebook-github-bot commented Apr 28, 2025

github-actions bot commented Apr 28, 2025

kimishpatel commented Apr 28, 2025

[ET-VK] Introduce generic export pass for fusing Q/DQ nodes #10525

Are you sure you want to change the base?

[ET-VK] Introduce generic export pass for fusing Q/DQ nodes #10525

Conversation

SS-JIA commented Apr 28, 2025 • edited Loading

Context

Changes

pytorch-bot bot commented Apr 28, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10525

❌ 2 New Failures

facebook-github-bot commented Apr 28, 2025

github-actions bot commented Apr 28, 2025

This PR needs a release notes: label

kimishpatel commented Apr 28, 2025

SS-JIA commented Apr 28, 2025 •

edited

Loading

pytorch-bot bot commented Apr 28, 2025 •

edited

Loading

This PR needs a `release notes:` label