Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Invalid rounding with reduce_tile REDUCE_SCALAR #17713

Open
hschoi4448 opened this issue Feb 7, 2025 · 0 comments
Open

[Bug Report] Invalid rounding with reduce_tile REDUCE_SCALAR #17713

hschoi4448 opened this issue Feb 7, 2025 · 0 comments
Assignees
Labels
bug Something isn't working moreh moreh contribution P1

Comments

@hschoi4448
Copy link
Contributor

hschoi4448 commented Feb 7, 2025

Describe the bug
While debugging recently, I noticed that the result of reduce_scalar seemed strange, and I discovered unusual behavior.

The result of the reduce operation varies depending on the position of the values.

    constexpr auto cb_in0 = tt::CBIndex::c_0;
    {
        cb_reserve_back(cb_in0, 1);
        float* ptr = reinterpret_cast<float*>(get_write_ptr(cb_in0));
        memset(ptr, 0, 1024 * sizeof(float));
        // this case result is 2.0
        ptr[0] = 1.0019531;
        ptr[16] = 1.0f;

        // this case result is 2.0019531250
        // ptr[0] = 1.0019531;
        // ptr[1] = 1.0f;

        cb_push_back(cb_in0, 1);
    }

    {
        constexpr auto cb_in2 = tt::CBIndex::c_2;
        cb_reserve_back(cb_in2, 1);
        float* ptr = reinterpret_cast<float*>(get_write_ptr(cb_in2));
        memset(ptr, 0, 1024 * sizeof(float));
        for (int i = 0; i < 1024; i++) {
            ptr[i] = 1;
        }
        cb_push_back(cb_in2, 1);
    }

I prepared both the data and scalar values in the reader kernel for the reduce operation. And the scalar CB values are all filled with 1.0f.
The test was conducted by adding the values 1.0019531 and 1.0.

if ptr[0] = 1.0019531, ptr[16] = 1.0. than reduce result is 2.0.
if ptr[0] = 1.0019531, ptr[1] = 1.0. than reduce result is 2.0019531250

I think that the fact the position of the values affects the result is a bug.

  • If there were many values, it would be understandable that the calculation order could affect the result. However, since there are only two values, it’s odd that the order has an impact on the result.

related kernel code:

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://github.com/tenstorrent/tt-metal/tree/hyungsuk/reduce_bug_example_scalar
  2. Run cmd: pytest ./tests/ttnn/unit_tests/operations/test_moreh_softmax.py
  3. See DPRINT result

Expected behavior
I hope the issue with the result can be fixed.

Screenshots
If applicable, add screenshots to help explain your problem.

Please complete the following environment information:

  • OS: Ubuntu 22.04.4 LTS
  • Version of software (eg. commit f191d02)
  • Arch: wormhole_b0

Additional context
Add any other context about the problem here.

@hschoi4448 hschoi4448 added bug Something isn't working moreh moreh contribution labels Feb 7, 2025
@cmaryanTT cmaryanTT assigned ttmtrajkovic and bbradelTT and unassigned bbradelTT Feb 10, 2025
@razorback3 razorback3 added the P1 label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working moreh moreh contribution P1
Projects
None yet
Development

No branches or pull requests

4 participants