Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Transposed conv2d PCC failures #17647

Closed
pavlepopovic opened this issue Feb 6, 2025 · 4 comments
Closed

[Bug Report] Transposed conv2d PCC failures #17647

pavlepopovic opened this issue Feb 6, 2025 · 4 comments
Assignees
Labels
bug Something isn't working P0

Comments

@pavlepopovic
Copy link
Contributor

Describe the bug
Transposed Conv2D has PCC failures in shapes we need for a customer model.
PCC issues seem strange, upon runs they remain the same, but following a tt-smi reset, they change (indicative of consuming some uninitialized memory), for example:

FAILED tests/ttnn/unit_tests/operations/eltwise/test_conv2d_transposed.py::test_transposed_conv2d[device_params={'l1_small_size': 16384}-batch=1-groups=1-input_channels=64-output_channels=64-input_height=32-input_width=4-weights_dtype=DataType.BFLOAT16-activations_dtype=DataType.BFLOAT16-kernel=(4, 4)-stride=(2, 2)-padding=(1, 1)-input_channels_alignment=32-act_block_h_override=0-act_block_w_div=1-deallocate_activation=True-output_layout=Layout.ROW_MAJOR-math_fidelity=MathFidelity.LoFi-fp32_accum=True-packer_l1_acc=False-math_approx_mode=True-dst_full_sync_en=False] - AssertionError: 0.0017708264138304882
FAILED tests/ttnn/unit_tests/operations/eltwise/test_conv2d_transposed.py::test_transposed_conv2d[device_params={'l1_small_size': 16384}-batch=1-groups=1-input_channels=128-output_channels=64-input_height=64-input_width=8-weights_dtype=DataType.BFLOAT16-activations_dtype=DataType.BFLOAT16-kernel=(4, 4)-stride=(2, 2)-padding=(1, 1)-input_channels_alignment=32-act_block_h_override=0-act_block_w_div=1-deallocate_activation=True-output_layout=Layout.ROW_MAJOR-math_fidelity=MathFidelity.LoFi-fp32_accum=True-packer_l1_acc=False-math_approx_mode=True-dst_full_sync_en=False] - AssertionError: 0.00474984900976193
FAILED tests/ttnn/unit_tests/operations/eltwise/test_conv2d_transposed.py::test_transposed_conv2d[device_params={'l1_small_size': 16384}-batch=1-groups=1-input_channels=64-output_channels=64-input_height=64-input_width=8-weights_dtype=DataType.BFLOAT16-activations_dtype=DataType.BFLOAT16-kernel=(4, 4)-stride=(2, 2)-padding=(1, 1)-input_channels_alignment=32-act_block_h_override=0-act_block_w_div=1-deallocate_activation=True-output_layout=Layout.ROW_MAJOR-math_fidelity=MathFidelity.LoFi-fp32_accum=True-packer_l1_acc=False-math_approx_mode=True-dst_full_sync_en=False] - AssertionError: 0.004483437985441515

Following a tt-smi reset, those change to:

FAILED tests/ttnn/unit_tests/operations/eltwise/test_conv2d_transposed.py::test_transposed_conv2d[device_params={'l1_small_size': 16384}-batch=1-groups=1-input_channels=64-output_channels=64-input_height=32-input_width=4-weights_dtype=DataType.BFLOAT16-activations_dtype=DataType.BFLOAT16-kernel=(4, 4)-stride=(2, 2)-padding=(1, 1)-input_channels_alignment=32-act_block_h_override=0-act_block_w_div=1-deallocate_activation=True-output_layout=Layout.ROW_MAJOR-math_fidelity=MathFidelity.LoFi-fp32_accum=True-packer_l1_acc=False-math_approx_mode=True-dst_full_sync_en=False] - AssertionError: -0.006266862572069966
FAILED tests/ttnn/unit_tests/operations/eltwise/test_conv2d_transposed.py::test_transposed_conv2d[device_params={'l1_small_size': 16384}-batch=1-groups=1-input_channels=128-output_channels=64-input_height=64-input_width=8-weights_dtype=DataType.BFLOAT16-activations_dtype=DataType.BFLOAT16-kernel=(4, 4)-stride=(2, 2)-padding=(1, 1)-input_channels_alignment=32-act_block_h_override=0-act_block_w_div=1-deallocate_activation=True-output_layout=Layout.ROW_MAJOR-math_fidelity=MathFidelity.LoFi-fp32_accum=True-packer_l1_acc=False-math_approx_mode=True-dst_full_sync_en=False] - AssertionError: 0.005928684620063588
FAILED tests/ttnn/unit_tests/operations/eltwise/test_conv2d_transposed.py::test_transposed_conv2d[device_params={'l1_small_size': 16384}-batch=1-groups=1-input_channels=64-output_channels=64-input_height=64-input_width=8-weights_dtype=DataType.BFLOAT16-activations_dtype=DataType.BFLOAT16-kernel=(4, 4)-stride=(2, 2)-padding=(1, 1)-input_channels_alignment=32-act_block_h_override=0-act_block_w_div=1-deallocate_activation=True-output_layout=Layout.ROW_MAJOR-math_fidelity=MathFidelity.LoFi-fp32_accum=True-packer_l1_acc=False-math_approx_mode=True-dst_full_sync_en=False] - AssertionError: 0.015021027894757167

To Reproduce
Attaching unit test here

import pytest
import torch
import ttnn

from loguru import logger

from tests.ttnn.utils_for_testing import check_with_pcc_without_tensor_printout

@pytest.mark.parametrize(
    "batch, groups, input_channels, output_channels, input_height, input_width, weights_dtype, activations_dtype, kernel, stride, padding, input_channels_alignment, act_block_h_override, act_block_w_div, deallocate_activation, output_layout, math_fidelity, fp32_accum, packer_l1_acc, math_approx_mode, dst_full_sync_en",
    [
        (1, 1, 64, 64, 32, 4, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 0, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),         # does not work
        (1, 1, 128, 64, 64, 8, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 0, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),        # does not work
        (1, 1, 64, 64, 64, 8, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 0, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),         # does not work

        # (1, 1, 128, 64, 128, 16, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),        #      works
        # (1, 1, 128, 2, 256, 32, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.TILE_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),              #      works
        # (1, 1, 128, 64, 128, 16, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 0, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),         #      works
        # (1, 1, 128, 64, 256, 32, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),        #      works
        # (1, 1, 128, 2, 512, 64, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.TILE_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),              #      works
        # (1, 1, 64, 64, 128, 16, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 0, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),          #      works
        # (1, 1, 64, 64, 256, 32, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 0, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),          #      works
        # (1, 1, 128, 64, 256,   64, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),      #      works
        # (1, 1, 128, 64, 128, 128, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.ROW_MAJOR_LAYOUT, ttnn.MathFidelity.LoFi, True, False, True, False,),       #      works
        # (1, 1, 128, 2, 128, 256, ttnn.bfloat16, ttnn.bfloat16, (4, 4), (2, 2), (1, 1), 32, 32, 1, True, ttnn.TILE_LAYOUT     , ttnn.MathFidelity.LoFi, True, False, True, False,),        #      works
    ],
)
@pytest.mark.parametrize("device_params", [{"l1_small_size": 16384}], indirect=True)
def test_transposed_conv2d(
    device,
    use_program_cache,
    batch,
    groups,
    input_channels,
    output_channels,
    input_height,
    input_width,
    weights_dtype,
    activations_dtype,
    kernel,
    stride,
    padding,
    input_channels_alignment,
    act_block_h_override,
    act_block_w_div,
    deallocate_activation,
    output_layout,
    math_fidelity,
    fp32_accum,
    packer_l1_acc,
    math_approx_mode,
    dst_full_sync_en,
):
    torch.manual_seed(11234)

    conv_input_shape = [batch, input_channels, input_height, input_width]
    conv_weight_shape = [input_channels, output_channels // groups, kernel[0], kernel[1]]
    conv_bias_shape = [1, 1, 1, output_channels]

    torch_input_tensor_nchw = torch.randn(conv_input_shape, dtype=torch.bfloat16).float()

    torch_weight_tensor = torch.randn(conv_weight_shape, dtype=torch.bfloat16).float()
    torch_bias_tensor = torch.randn(conv_bias_shape, dtype=torch.bfloat16).float()

    torch_out_golden_tensor = torch.nn.functional.conv_transpose2d(
        torch_input_tensor_nchw,
        torch_weight_tensor,
        bias=torch_bias_tensor.reshape(-1),
        stride=stride,
        padding=padding,
        output_padding=(0, 0),
        dilation=(1, 1),
        groups=groups,
    )
    torch_out_golden_tensor = torch.nn.functional.relu(torch_out_golden_tensor)

    tt_weight_tensor = ttnn.from_torch(
        torch_weight_tensor, weights_dtype if weights_dtype != ttnn.bfloat8_b else ttnn.float32
    )
    tt_bias_tensor = ttnn.from_torch(
            torch_bias_tensor, weights_dtype if weights_dtype != ttnn.bfloat8_b else ttnn.float32
    )

    torch_input_tensor = torch.permute(torch_input_tensor_nchw, (0, 2, 3, 1))

    tt_input_tensor = ttnn.from_torch(torch_input_tensor, ttnn.bfloat16, mesh_mapper=None)
    tt_input_tensor = tt_input_tensor.reshape(
        1,
        1,
        tt_input_tensor.shape[0] * tt_input_tensor.shape[1] * tt_input_tensor.shape[2],
        tt_input_tensor.shape[3],
    )

    conv_config = ttnn.Conv2dConfig(
        dtype=activations_dtype,
        weights_dtype=weights_dtype,
        shard_layout=ttnn.TensorMemoryLayout.HEIGHT_SHARDED,
        input_channels_alignment=input_channels_alignment,
        activation="relu",
        deallocate_activation=deallocate_activation,
        enable_act_double_buffer=False,
        enable_split_reader=False,
        enable_subblock_padding=False,
        output_layout=output_layout,
        act_block_h_override=act_block_h_override,
        act_block_w_div=act_block_w_div,
        override_sharding_config = False,
    )
    compute_config = ttnn.init_device_compute_kernel_config(
        device.arch(),
        math_fidelity=math_fidelity,
        fp32_dest_acc_en=fp32_accum,
        packer_l1_acc=packer_l1_acc,
        math_approx_mode=math_approx_mode,
        dst_full_sync_en=dst_full_sync_en,
    )

    [tt_output_tensor, [out_height, out_width], [weight, bias]] = ttnn.conv_transpose2d(
            input_tensor=tt_input_tensor,
            weight_tensor=tt_weight_tensor,
            bias_tensor=tt_bias_tensor,
            device=device,
            in_channels=input_channels,
            out_channels=output_channels,
            input_height=input_height,
            input_width=input_width,
            batch_size=batch,
            kernel_size=kernel,
            stride=stride,
            padding=padding,
            output_padding=(0, 0),
            dilation=(1, 1),
            conv_config=conv_config,
            compute_config=compute_config,
            groups=groups,
            return_output_dim=True,
            return_weights_and_bias=True,
        )
    tt_output_tensor = ttnn.from_device(tt_output_tensor)
    torch_output_tensor = ttnn.to_torch(tt_output_tensor, mesh_composer=None)

    # NHWC to NCHW
    torch_output_tensor = torch_output_tensor.reshape(
        batch, out_height, out_width, torch_output_tensor.shape[-1]
    )
    torch_output_tensor = torch_output_tensor[:, :, :, :output_channels]
    torch_output_tensor = torch.permute(torch_output_tensor, (0, 3, 1, 2))

    target_pcc = 0.995
    passing, pcc_msg = check_with_pcc_without_tensor_printout(torch_output_tensor, torch_out_golden_tensor, pcc=target_pcc)
    logger.info(f"PCC = {pcc_msg}. Threshold = {target_pcc}")
    assert passing, pcc_msg
@pavlejosipovic
Copy link
Contributor

pavlejosipovic commented Feb 6, 2025

Root cause of this is #16888
PR is here #16937

If you comment out output_layout=output_layout, tests will pass

@pavlejosipovic
Copy link
Contributor

Unfortunately, PR #16937 doesn't resolve this issue,
but choosing Tiled layout resolves the issue.
@pavlepopovic pls work with @sankarmanoj-tt to resolve this one.

@pavlejosipovic
Copy link
Contributor

@pavlepopovic can you use tile layout for output in model until this is fixed and call separate op for untialize?

sankarmanoj-tt added a commit that referenced this issue Feb 12, 2025
@sankarmanoj-tt
Copy link
Contributor

This is an issue with non_tile_height. Forcing check_non_tile_height to return False fixes this issue.

pavlejosipovic pushed a commit that referenced this issue Feb 18, 2025
These two features are non critical for conv2d
meaning they don't contribute to enabling any model
perf on any model or improve pass rate on any sweep.

Problem with these features is that they kick in
in very unpredictable conditions for both users and
developers as they have many limits/conditions.

They are adding to conv2d test matrix, but they are
hard to test for as deriving tests that will trigger
them on multiple hw platforms is not easy.

Moreover they are source of bugs like #17647, and it's
often non obvious that bugs originate from these features
and when faced with a bug in conv2d first thing is to go to the
code and manually disable them to check for that.

For the reasons above these will get removed, and by
removing them #17647 will be fixed.
pavlejosipovic pushed a commit that referenced this issue Feb 19, 2025
These two features are non critical for conv2d
meaning they don't contribute to enabling any model
perf on any model or improve pass rate on any sweep.

Problem with these features is that they kick in
in very unpredictable conditions for both users and
developers as they have many limits/conditions.

They are adding to conv2d test matrix, but they are
hard to test for as deriving tests that will trigger
them on multiple hw platforms is not easy.

Moreover they are source of bugs like #17647, and it's
often non obvious that bugs originate from these features
and when faced with a bug in conv2d first thing is to go to the
code and manually disable them to check for that.

For the reasons above these will get removed, and by
removing them #17647 will be fixed.
pavlejosipovic pushed a commit that referenced this issue Feb 19, 2025
These two features are non critical for conv2d
meaning they don't contribute to enabling any model
perf on any model or improve pass rate on any sweep.

Problem with these features is that they kick in
in very unpredictable conditions for both users and
developers as they have many limits/conditions.

They are adding to conv2d test matrix, but they are
hard to test for as deriving tests that will trigger
them on multiple hw platforms is not easy.

Moreover they are source of bugs like #17647, and it's
often non obvious that bugs originate from these features
and when faced with a bug in conv2d first thing is to go to the
code and manually disable them to check for that.

For the reasons above these will get removed, and by
removing them #17647 will be fixed.
pavlejosipovic pushed a commit that referenced this issue Feb 19, 2025
These two features are non critical for conv2d
meaning they don't contribute to enabling any model
perf on any model or improve pass rate on any sweep.

Problem with these features is that they kick in
in very unpredictable conditions for both users and
developers as they have many limits/conditions.

They are adding to conv2d test matrix, but they are
hard to test for as deriving tests that will trigger
them on multiple hw platforms is not easy.

Moreover they are source of bugs like #17647, and it's
often non obvious that bugs originate from these features
and when faced with a bug in conv2d first thing is to go to the
code and manually disable them to check for that.

For the reasons above these will get removed, and by
removing them #17647 will be fixed.
dgomezTT pushed a commit that referenced this issue Feb 19, 2025
These two features are non critical for conv2d
meaning they don't contribute to enabling any model
perf on any model or improve pass rate on any sweep.

Problem with these features is that they kick in
in very unpredictable conditions for both users and
developers as they have many limits/conditions.

They are adding to conv2d test matrix, but they are
hard to test for as deriving tests that will trigger
them on multiple hw platforms is not easy.

Moreover they are source of bugs like #17647, and it's
often non obvious that bugs originate from these features
and when faced with a bug in conv2d first thing is to go to the
code and manually disable them to check for that.

For the reasons above these will get removed, and by
removing them #17647 will be fixed.
hschoi4448 pushed a commit that referenced this issue Feb 20, 2025
These two features are non critical for conv2d
meaning they don't contribute to enabling any model
perf on any model or improve pass rate on any sweep.

Problem with these features is that they kick in
in very unpredictable conditions for both users and
developers as they have many limits/conditions.

They are adding to conv2d test matrix, but they are
hard to test for as deriving tests that will trigger
them on multiple hw platforms is not easy.

Moreover they are source of bugs like #17647, and it's
often non obvious that bugs originate from these features
and when faced with a bug in conv2d first thing is to go to the
code and manually disable them to check for that.

For the reasons above these will get removed, and by
removing them #17647 will be fixed.
TT-billteng pushed a commit that referenced this issue Feb 21, 2025
These two features are non critical for conv2d
meaning they don't contribute to enabling any model
perf on any model or improve pass rate on any sweep.

Problem with these features is that they kick in
in very unpredictable conditions for both users and
developers as they have many limits/conditions.

They are adding to conv2d test matrix, but they are
hard to test for as deriving tests that will trigger
them on multiple hw platforms is not easy.

Moreover they are source of bugs like #17647, and it's
often non obvious that bugs originate from these features
and when faced with a bug in conv2d first thing is to go to the
code and manually disable them to check for that.

For the reasons above these will get removed, and by
removing them #17647 will be fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0
Projects
None yet
Development

No branches or pull requests

3 participants