[Bug Report] Transposed conv2d PCC failures #17647

pavlepopovic · 2025-02-06T11:34:37Z

Describe the bug
Transposed Conv2D has PCC failures in shapes we need for a customer model.
PCC issues seem strange, upon runs they remain the same, but following a tt-smi reset, they change (indicative of consuming some uninitialized memory), for example:

FAILED tests/ttnn/unit_tests/operations/eltwise/test_conv2d_transposed.py::test_transposed_conv2d[device_params={'l1_small_size': 16384}-batch=1-groups=1-input_channels=64-output_channels=64-input_height=32-input_width=4-weights_dtype=DataType.BFLOAT16-activations_dtype=DataType.BFLOAT16-kernel=(4, 4)-stride=(2, 2)-padding=(1, 1)-input_channels_alignment=32-act_block_h_override=0-act_block_w_div=1-deallocate_activation=True-output_layout=Layout.ROW_MAJOR-math_fidelity=MathFidelity.LoFi-fp32_accum=True-packer_l1_acc=False-math_approx_mode=True-dst_full_sync_en=False] - AssertionError: 0.0017708264138304882
FAILED tests/ttnn/unit_tests/operations/eltwise/test_conv2d_transposed.py::test_transposed_conv2d[device_params={'l1_small_size': 16384}-batch=1-groups=1-input_channels=128-output_channels=64-input_height=64-input_width=8-weights_dtype=DataType.BFLOAT16-activations_dtype=DataType.BFLOAT16-kernel=(4, 4)-stride=(2, 2)-padding=(1, 1)-input_channels_alignment=32-act_block_h_override=0-act_block_w_div=1-deallocate_activation=True-output_layout=Layout.ROW_MAJOR-math_fidelity=MathFidelity.LoFi-fp32_accum=True-packer_l1_acc=False-math_approx_mode=True-dst_full_sync_en=False] - AssertionError: 0.00474984900976193
FAILED tests/ttnn/unit_tests/operations/eltwise/test_conv2d_transposed.py::test_transposed_conv2d[device_params={'l1_small_size': 16384}-batch=1-groups=1-input_channels=64-output_channels=64-input_height=64-input_width=8-weights_dtype=DataType.BFLOAT16-activations_dtype=DataType.BFLOAT16-kernel=(4, 4)-stride=(2, 2)-padding=(1, 1)-input_channels_alignment=32-act_block_h_override=0-act_block_w_div=1-deallocate_activation=True-output_layout=Layout.ROW_MAJOR-math_fidelity=MathFidelity.LoFi-fp32_accum=True-packer_l1_acc=False-math_approx_mode=True-dst_full_sync_en=False] - AssertionError: 0.004483437985441515

Following a tt-smi reset, those change to:

FAILED tests/ttnn/unit_tests/operations/eltwise/test_conv2d_transposed.py::test_transposed_conv2d[device_params={'l1_small_size': 16384}-batch=1-groups=1-input_channels=64-output_channels=64-input_height=32-input_width=4-weights_dtype=DataType.BFLOAT16-activations_dtype=DataType.BFLOAT16-kernel=(4, 4)-stride=(2, 2)-padding=(1, 1)-input_channels_alignment=32-act_block_h_override=0-act_block_w_div=1-deallocate_activation=True-output_layout=Layout.ROW_MAJOR-math_fidelity=MathFidelity.LoFi-fp32_accum=True-packer_l1_acc=False-math_approx_mode=True-dst_full_sync_en=False] - AssertionError: -0.006266862572069966
FAILED tests/ttnn/unit_tests/operations/eltwise/test_conv2d_transposed.py::test_transposed_conv2d[device_params={'l1_small_size': 16384}-batch=1-groups=1-input_channels=128-output_channels=64-input_height=64-input_width=8-weights_dtype=DataType.BFLOAT16-activations_dtype=DataType.BFLOAT16-kernel=(4, 4)-stride=(2, 2)-padding=(1, 1)-input_channels_alignment=32-act_block_h_override=0-act_block_w_div=1-deallocate_activation=True-output_layout=Layout.ROW_MAJOR-math_fidelity=MathFidelity.LoFi-fp32_accum=True-packer_l1_acc=False-math_approx_mode=True-dst_full_sync_en=False] - AssertionError: 0.005928684620063588
FAILED tests/ttnn/unit_tests/operations/eltwise/test_conv2d_transposed.py::test_transposed_conv2d[device_params={'l1_small_size': 16384}-batch=1-groups=1-input_channels=64-output_channels=64-input_height=64-input_width=8-weights_dtype=DataType.BFLOAT16-activations_dtype=DataType.BFLOAT16-kernel=(4, 4)-stride=(2, 2)-padding=(1, 1)-input_channels_alignment=32-act_block_h_override=0-act_block_w_div=1-deallocate_activation=True-output_layout=Layout.ROW_MAJOR-math_fidelity=MathFidelity.LoFi-fp32_accum=True-packer_l1_acc=False-math_approx_mode=True-dst_full_sync_en=False] - AssertionError: 0.015021027894757167

To Reproduce
Attaching unit test here

import pytest
import torch
import ttnn

from loguru import logger

from tests.ttnn.utils_for_testing import check_with_pcc_without_tensor_printout

@pytest.mark.parametrize(
    "batch, groups, input_channels, output_channels, input_height, input_width, weights_dtype, activations_dtype, kernel, stride, padding, input_channels_alignment, act_block_h_override, act_block_w_div, deallocate_activation, output_layout, math_fidelity, fp32_accum, packer_l1_acc, math_approx_mode, dst_full_sync_en",
    [
        (1, 1, 64, 64, 32, 4, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 0, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),         # does not work
        (1, 1, 128, 64, 64, 8, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 0, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),        # does not work
        (1, 1, 64, 64, 64, 8, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 0, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),         # does not work

        # (1, 1, 128, 64, 128, 16, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),        #      works
        # (1, 1, 128, 2, 256, 32, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.TILE_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),              #      works
        # (1, 1, 128, 64, 128, 16, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 0, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),         #      works
        # (1, 1, 128, 64, 256, 32, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),        #      works
        # (1, 1, 128, 2, 512, 64, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.TILE_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),              #      works
        # (1, 1, 64, 64, 128, 16, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 0, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),          #      works
        # (1, 1, 64, 64, 256, 32, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 0, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),          #      works
        # (1, 1, 128, 64, 256,   64, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),      #      works
        # (1, 1, 128, 64, 128, 128, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),       #      works
        # (1, 1, 128, 2, 128, 256, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.TILE_LAYOUT     , ttnn.MathFidelity.LoFi, True, False, True, False,),        #      works
    ],
)
@pytest.mark.parametrize("device_params", [{"l1_small_size": 16384}], indirect=True)
def test_transposed_conv2d(
    device,
    use_program_cache,
    batch,
    groups,
    input_channels,
    output_channels,
    input_height,
    input_width,
    weights_dtype,
    activations_dtype,
    kernel,
    stride,
    padding,
    input_channels_alignment,
    act_block_h_override,
    act_block_w_div,
    deallocate_activation,
    output_layout,
    math_fidelity,
    fp32_accum,
    packer_l1_acc,
    math_approx_mode,
    dst_full_sync_en,
):
    torch.manual_seed(11234)

    conv_input_shape = [batch, input_channels, input_height, input_width]
    conv_weight_shape = [input_channels, output_channels // groups, kernel[0], kernel[1]]
    conv_bias_shape = [1, 1, 1, output_channels]

    torch_input_tensor_nchw = torch.randn(conv_input_shape, dtype=torch.bfloat16).float()

    torch_weight_tensor = torch.randn(conv_weight_shape, dtype=torch.bfloat16).float()
    torch_bias_tensor = torch.randn(conv_bias_shape, dtype=torch.bfloat16).float()

    torch_out_golden_tensor = torch.nn.functional.conv_transpose2d(
        torch_input_tensor_nchw,
        torch_weight_tensor,
        bias=torch_bias_tensor.reshape(-1),
        stride=stride,
        padding=padding,
        output_padding=(0, 0),
        dilation=(1, 1),
        groups=groups,
    )
    torch_out_golden_tensor = torch.nn.functional.relu(torch_out_golden_tensor)

    tt_weight_tensor = ttnn.from_torch(
        torch_weight_tensor, weights_dtype if weights_dtype != ttnn.bfloat8_b else ttnn.float32
    )
    tt_bias_tensor = ttnn.from_torch(
            torch_bias_tensor, weights_dtype if weights_dtype != ttnn.bfloat8_b else ttnn.float32
    )

    torch_input_tensor = torch.permute(torch_input_tensor_nchw, (0, 2, 3, 1))

    tt_input_tensor = ttnn.from_torch(torch_input_tensor, ttnn.bfloat16, mesh_mapper=None)
    tt_input_tensor = tt_input_tensor.reshape(
        1,
        1,
        tt_input_tensor.shape[0] * tt_input_tensor.shape[1] * tt_input_tensor.shape[2],
        tt_input_tensor.shape[3],
    )

    conv_config = ttnn.Conv2dConfig(
        dtype=activations_dtype,
        weights_dtype=weights_dtype,
        shard_layout=ttnn.TensorMemoryLayout.HEIGHT_SHARDED,
        input_channels_alignment=input_channels_alignment,
        activation="relu",
        deallocate_activation=deallocate_activation,
        enable_act_double_buffer=False,
        enable_split_reader=False,
        enable_subblock_padding=False,
        output_layout=output_layout,
        act_block_h_override=act_block_h_override,
        act_block_w_div=act_block_w_div,
        override_sharding_config = False,
    )
    compute_config = ttnn.init_device_compute_kernel_config(
        device.arch(),
        math_fidelity=math_fidelity,
        fp32_dest_acc_en=fp32_accum,
        packer_l1_acc=packer_l1_acc,
        math_approx_mode=math_approx_mode,
        dst_full_sync_en=dst_full_sync_en,
    )

    [tt_output_tensor, [out_height, out_width], [weight, bias]] = ttnn.conv_transpose2d(
            input_tensor=tt_input_tensor,
            weight_tensor=tt_weight_tensor,
            bias_tensor=tt_bias_tensor,
            device=device,
            in_channels=input_channels,
            out_channels=output_channels,
            input_height=input_height,
            input_width=input_width,
            batch_size=batch,
            kernel_size=kernel,
            stride=stride,
            padding=padding,
            output_padding=(0, 0),
            dilation=(1, 1),
            conv_config=conv_config,
            compute_config=compute_config,
            groups=groups,
            return_output_dim=True,
            return_weights_and_bias=True,
        )
    tt_output_tensor = ttnn.from_device(tt_output_tensor)
    torch_output_tensor = ttnn.to_torch(tt_output_tensor, mesh_composer=None)

    # NHWC to NCHW
    torch_output_tensor = torch_output_tensor.reshape(
        batch, out_height, out_width, torch_output_tensor.shape[-1]
    )
    torch_output_tensor = torch_output_tensor[:, :, :, :output_channels]
    torch_output_tensor = torch.permute(torch_output_tensor, (0, 3, 1, 2))

    target_pcc = 0.995
    passing, pcc_msg = check_with_pcc_without_tensor_printout(torch_output_tensor, torch_out_golden_tensor, pcc=target_pcc)
    logger.info(f"PCC = {pcc_msg}. Threshold = {target_pcc}")
    assert passing, pcc_msg

The text was updated successfully, but these errors were encountered:

pavlejosipovic · 2025-02-06T15:52:22Z

Root cause of this is #16888
PR is here #16937

If you comment out output_layout=output_layout, tests will pass

pavlejosipovic · 2025-02-06T16:01:43Z

Unfortunately, PR #16937 doesn't resolve this issue,
but choosing Tiled layout resolves the issue.
@pavlepopovic pls work with @sankarmanoj-tt to resolve this one.

pavlejosipovic · 2025-02-06T16:03:50Z

@pavlepopovic can you use tile layout for output in model until this is fixed and call separate op for untialize?

sankarmanoj-tt · 2025-02-13T03:39:01Z

This is an issue with non_tile_height. Forcing check_non_tile_height to return False fixes this issue.

These two features are non critical for conv2d meaning they don't contribute to enabling any model perf on any model or improve pass rate on any sweep. Problem with these features is that they kick in in very unpredictable conditions for both users and developers as they have many limits/conditions. They are adding to conv2d test matrix, but they are hard to test for as deriving tests that will trigger them on multiple hw platforms is not easy. Moreover they are source of bugs like #17647, and it's often non obvious that bugs originate from these features and when faced with a bug in conv2d first thing is to go to the code and manually disable them to check for that. For the reasons above these will get removed, and by removing them #17647 will be fixed.

pavlepopovic added bug Something isn't working P0 labels Feb 6, 2025

pavlepopovic assigned pavlejosipovic and sankarmanoj-tt Feb 6, 2025

sankarmanoj-tt added a commit that referenced this issue Feb 12, 2025

#0: Added tests from #17647

59d8d58

pavlejosipovic mentioned this issue Feb 18, 2025

Remove non tile multiple width/height from conv2d #17937

Merged

4 tasks

pavlejosipovic closed this as completed Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Transposed conv2d PCC failures #17647

[Bug Report] Transposed conv2d PCC failures #17647

pavlepopovic commented Feb 6, 2025

pavlejosipovic commented Feb 6, 2025 •

edited

Loading

pavlejosipovic commented Feb 6, 2025

pavlejosipovic commented Feb 6, 2025

sankarmanoj-tt commented Feb 13, 2025

[Bug Report] Transposed conv2d PCC failures #17647

[Bug Report] Transposed conv2d PCC failures #17647

Comments

pavlepopovic commented Feb 6, 2025

pavlejosipovic commented Feb 6, 2025 • edited Loading

pavlejosipovic commented Feb 6, 2025

pavlejosipovic commented Feb 6, 2025

sankarmanoj-tt commented Feb 13, 2025

pavlejosipovic commented Feb 6, 2025 •

edited

Loading