#16888: Fix Conv2D when output is in Row Major #16937

sankarmanoj-tt · 2025-01-21T18:08:51Z

Ticket

Problem description

Lots of test cases fail when output is in Row Major.

What's changed

Matmul_partials_cb is the input CB to the untilize block. Matmul_partials_cb may be directly mapped to the output buffer, or have its own memory allocated.

If it's mapped to the output buffer, use_partials_for_out == True, then the read & write pointers have to be incremented after every in0 height block.

If it has it's own memory allocated, use_partials_for_out == False, then the read & write pointers have to be reset to the original position.

Checklist

Post commit CI passes
Blackhole Post commit (if applicable) passes
Model regression CI testing passes
Device performance regression CI testing passes (if applicable)
Nightly L2 passes
(For models and ops writers) Full new models tests passes
New/Existing tests provide coverage for changes

mywoodstock · 2025-02-14T17:09:49Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

@@ -42,7 +42,7 @@ def run_conv(
    padded_input_channels=None,
    fp32_accum=False,
    packer_l1_acc=False,
-    output_layout=ttnn.TILE_LAYOUT,
+    output_layout=ttnn.ROW_MAJOR_LAYOUT,


changing the default? we also need to test the TILE output layout more extensively.

Most tests use the default value, and to ensure that untilize_out works, I changed the default.

I would keep tile as default as that is what is being used by models, add row major variant in tests where needed

mywoodstock · 2025-02-14T17:10:42Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

+    if (
+        shard_layout == ttnn.TensorMemoryLayout.HEIGHT_SHARDED
+        and output_channels > 256
+        and output_layout == ttnn.ROW_MAJOR_LAYOUT


Does this case work with TILE layout?

This needs to added to validate function as well as TT_FATAL

mywoodstock · 2025-02-14T17:11:16Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

+        and output_channels > 256
+        and output_layout == ttnn.ROW_MAJOR_LAYOUT
+    ):
+        pytest.skip("Skipping when out_block_w > 8 for untilize_out == True")


also create a ticket to add support for this case, and mention the ticket number here.

mywoodstock · 2025-02-14T17:12:11Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

+        if fp32_accum and packer_l1_acc and output_layout == ttnn.ROW_MAJOR_LAYOUT:
+            conv_config.output_layout = ttnn.TILE_LAYOUT
+            logger.warning(
+                "Forcing output_layout to TILE when act_block_h_override, fp32_accum and packer_l1_acc are enabled"


why TILE only for fp32_accum and packer_l1_acc?

It fails when untilize_out packer_l1 and fp32_accum are all enabled.

Can we add to validate function TT_FATAL for combinations we know are not working?

mywoodstock · 2025-02-14T17:12:20Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

@@ -169,7 +182,8 @@ def run_conv(
        return_weights_and_bias=True,
        return_output_dim=True,
    )
-
+    # import numpy as np
+    # np.save("ref.npy",weights_device.cpu().to_torch().numpy())


mywoodstock · 2025-02-14T17:12:42Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

@@ -1213,7 +1229,7 @@ def test_resnet50_conv_wh_fp32(
 )
 @pytest.mark.parametrize(
    "activations_dtype",
-    [ttnn.bfloat8_b],
+    [ttnn.bfloat16],


why change?

To test row major output.

mywoodstock · 2025-02-14T17:13:46Z

ttnn/cpp/ttnn/operations/conv/conv2d/conv2d_utils.cpp

@@ -86,7 +86,7 @@ bool check_non_tile_mul_width(
    auto elem_size = conv_config.weights_dtype == DataType::BFLOAT8_B ? 1 : 2;
    bool is_non_tile_mul_width =
        (conv_config.shard_layout.has_value() && conv_config.shard_layout == TensorMemoryLayout::BLOCK_SHARDED) &&
-        conv_config.act_block_h_override == 0 &&
+        // conv_config.act_block_h_override == 0 &&


mywoodstock · 2025-02-14T17:14:01Z

ttnn/cpp/ttnn/operations/conv/conv2d/device/kernels/conv_bmm_tilize_col_major_out_blocks.cpp

@@ -24,6 +24,9 @@
 // SliceRange srr = SliceRange{.h0 = 0, .h1 = 1, .hs = 8, .w0 = 0, .w1 = 32, .ws = 1};
 // SliceRange srr1 = SliceRange{.h0 = 1, .h1 = 2, .hs = 8, .w0 = 0, .w1 = 32, .ws = 1};
 // SliceRange src = SliceRange{.h0 = 0, .h1 = 32, .hs = 1, .w0 = 0, .w1 = 1, .ws = 1};
+// SliceRange row_range = SliceRange{.h0 = 0, .h1 = 1, .hs = 1, .w0 = 0, .w1 = 16, .ws = 1};
+// SliceRange col_range = SliceRange{.h0 = 0, .h1 = 16, .hs = 1, .w0 = 0, .w1 = 1, .ws = 1};
+// SliceRange sq_range = SliceRange{.h0 = 0, .h1 = 4, .hs = 1, .w0 = 0, .w1 = 4, .ws = 1};


lets delete the whole commented out block

Sure. I will remove once I've got all the tests passing.

pavlejosipovic

Can you describe in PR what was the issue in compute kernel and what was done to fix, I can't follow the code.

pavlejosipovic

What is the impact of this change on tests runtime?

ttnn/cpp/ttnn/operations/conv/conv2d/device/kernels/conv_bmm_tilize_col_major_out_blocks.cpp

mywoodstock · 2025-02-25T19:28:24Z

ttnn/cpp/ttnn/operations/conv/conv2d/device/kernels/conv_bmm_tilize_col_major_out_blocks.cpp

-
+                if (curr_matmul_out_cb == matmul_partials_cb) {
+                    UNPACK(
+                        if (!use_partials_for_out) get_local_cb_interface(matmul_partials_cb).fifo_rd_ptr =


make constexpr

mywoodstock · 2025-02-25T19:28:34Z

ttnn/cpp/ttnn/operations/conv/conv2d/device/kernels/conv_bmm_tilize_col_major_out_blocks.cpp

+                        if (!use_partials_for_out) get_local_cb_interface(matmul_partials_cb).fifo_rd_ptr =
+                            partials_cb_read_ptr);
+                    PACK(
+                        if (!use_partials_for_out) get_local_cb_interface(matmul_partials_cb).fifo_wr_ptr =


make constexpr

mywoodstock · 2025-02-25T19:28:44Z

ttnn/cpp/ttnn/operations/conv/conv2d/device/kernels/conv_bmm_tilize_col_major_out_blocks.cpp

@@ -366,7 +375,9 @@ void MAIN {
                cb_pop_front(in1_cb_id, in1_block_num_tiles);
            }  // for in0_num_blocks_w
            if constexpr (matmul_partials_cb == mm_out_cb_id) {
-                UNPACK(get_local_cb_interface(matmul_partials_cb).fifo_rd_ptr = partials_cb_read_ptr);
+                UNPACK(
+                    if (use_partials_for_out) get_local_cb_interface(matmul_partials_cb).fifo_rd_ptr =


make constexpr

mywoodstock · 2025-02-25T19:28:52Z

ttnn/cpp/ttnn/operations/conv/conv2d/device/kernels/conv_bmm_tilize_col_major_out_blocks.cpp

@@ -416,6 +427,10 @@ void MAIN {
                    in1_index_subblock_offset += out_subblock_w;
                }  // for in1_num_subblocks
            }  // in0_num_subblocks
+            if (untilize_out) {


make constexpr

pavlejosipovic · 2025-02-26T12:57:40Z

ttnn/cpp/ttnn/operations/conv/conv2d/device/conv2d_op.cpp

@@ -226,17 +226,26 @@ std::vector<TensorSpec> OptimizedConvNew::compute_output_specs(const std::vector
                    dtype, PageConfig(output_layout), mem_config, output_shape, padded_output_shape))};
        } else if (this->memory_config.memory_layout == TensorMemoryLayout::WIDTH_SHARDED) {
            uint32_t total_height_tiles = padded_output_shape.volume() / padded_output_shape[-1] / TILE_HEIGHT;
-            std::array<uint32_t, 2> shard_shape = {
+


I think it's tricky to use logical sharding for conv output only for width sharding.
In general most ops are not ready to consume logical sharding (like TMs) and change like this could trigger models to fail.
Now people will have to have in mind that widht sharing outputs logical shards but HS and BS output physical shard.

What was to reason to go for logical sharding here?

With width sharding and the output as Row Major Layout, this condition was failing when the output height * width was not a multiple of 32.

When the output was in Tile Layout, it would correctly pad the shard. However, in Row Major Layout, the physical size of the shard would always be the logical size.

Should HS and BS also be changed to Logical Sharding?

can you trigger same problem with HS if you use shallow cone (input_channel_aligmnet to 8 or 16) and untialize output?
This problem doesn't sound specific to WS.
If you can make all models work with conv output as logical sharding that would be great.

sankarmanoj-tt · 2025-02-27T08:07:24Z

There is around a 5% increase in kernel execution time when the output is in Row Major.
ops_perf_results_rm_comp_2025_02_27_07_57_07.csv

pavlejosipovic · 2025-02-27T15:00:57Z

There is around a 5% increase in kernel execution time when the output is in Row Major. ops_perf_results_rm_comp_2025_02_27_07_57_07.csv

Sorry I wasn't clear enough I asked about is runtime of test_new_conv2d.py increased now, and if so by how much (as we were trying to reduce the runtime of this to make post commit faster)

sankarmanoj-tt force-pushed the smanoj/fix_conv2d_RM branch from bd63047 to ee50790 Compare January 27, 2025 05:53

sankarmanoj-tt force-pushed the smanoj/fix_conv2d_RM branch from ee50790 to 364b76a Compare February 3, 2025 06:23

sankarmanoj-tt marked this pull request as ready for review February 3, 2025 06:42

sankarmanoj-tt requested a review from a team as a code owner February 3, 2025 06:42

sankarmanoj-tt force-pushed the smanoj/fix_conv2d_RM branch from c5c06ec to 62b6161 Compare February 6, 2025 05:47

pavlejosipovic mentioned this pull request Feb 6, 2025

[Bug Report] Transposed conv2d PCC failures #17647

Closed

mywoodstock reviewed Feb 14, 2025

View reviewed changes

mywoodstock requested a review from pavlejosipovic February 14, 2025 17:14

pavlejosipovic reviewed Feb 16, 2025

View reviewed changes

sankarmanoj-tt force-pushed the smanoj/fix_conv2d_RM branch 4 times, most recently from 4894625 to 3afb00d Compare February 21, 2025 19:11

sankarmanoj-tt force-pushed the smanoj/fix_conv2d_RM branch from ad9d18e to ccb25e9 Compare February 25, 2025 06:20

pavlejosipovic reviewed Feb 25, 2025

View reviewed changes

ttnn/cpp/ttnn/operations/conv/conv2d/device/kernels/conv_bmm_tilize_col_major_out_blocks.cpp Outdated Show resolved Hide resolved

mywoodstock reviewed Feb 25, 2025

View reviewed changes

sankarmanoj-tt added 7 commits February 25, 2025 21:19

Fix Untilize Out

76e5bcb

#0: Rebase fix

abf5a13

#0: Rebase fix

fb69f34

#0: Fix tests

689be9c

#0: Fix tests

a64adaa

#0: Removed comments

40bbaf4

#0: Make partials_use_out as a constexpr

474ed8c

sankarmanoj-tt force-pushed the smanoj/fix_conv2d_RM branch from ccb25e9 to 474ed8c Compare February 25, 2025 22:33

pavlejosipovic reviewed Feb 26, 2025

View reviewed changes

#16888: Fix Conv2D when output is in Row Major #16937

Are you sure you want to change the base?

#16888: Fix Conv2D when output is in Row Major #16937

Conversation

sankarmanoj-tt commented Jan 21, 2025 • edited Loading

Ticket

Problem description

What's changed

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavlejosipovic left a comment

Choose a reason for hiding this comment

pavlejosipovic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sankarmanoj-tt commented Feb 27, 2025

pavlejosipovic commented Feb 27, 2025

sankarmanoj-tt commented Jan 21, 2025 •

edited

Loading