[Codegen][AMDGPU Backend] Correctness issue for conv_2d_ngchw_gfchw #18798
Labels
bug 🐞
Something isn't working
codegen/llvm
LLVM code generation compiler backend
codegen/rocm
ROCm code generation compiler backend (HIP/HSA)
Problem Description
The following IR
With inputs generated using the following numpy commands
Produces correct results on gfx1100 and gfx942 using this compile + run command
and incorrect results when adding
--iree-codegen-llvmgpu-test-tile-and-fuse-vectorize=true
on this branch: #18474Changing the llvm optimization level to
None
orLess
produces correct results when using the above flag:iree/compiler/plugins/target/ROCM/ROCMTarget.cpp
Line 466 in c6056d1
Investigation
The IR generated immediately before lowering scf to control flow looks like the following:
(workgroup count is
[1, 1 1]
, i.e. single workgroup).Where it is simply looping over the reduction dims of the
conv_2d
and accumulating.%8
and%9
are the loads for the image and filters respectively. In the above sample inputs,%8
is always1
(np.ones), while%9
is broadcasted[1, 2, 1]
along the inner most dim, so the only index that affects the loaded value is%arg3
.Note that switching the input to be
[2, 1, 1]
broadcasted from the inner most dim changes the output to104
from88
, and using[1, 1, 2]
gives correct results, indicating that somehow the load for%arg3 = 1
somehow got replaced with a duplicate load to the first value. Additionally this only reproduces incorrect results if the input channel dimension (8
in this example) is >= 7. For smaller input channel dims this produces correct values.Additionally changing the input values for the image (
%8
) to be broadcasted[1, 2, 1]
and make the filter (%9
) uniform gives correct values, indicating that it is specifically the second load in this example that is getting mangled.The text was updated successfully, but these errors were encountered: