-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Backend][AIE] Support global indexing, tensor access and kernel reindex for AIE kernel mapping #300
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good to me. We need more comments for your implementation so that it is easier for later development.
allo/backend/aie.py
Outdated
@@ -221,86 +237,153 @@ def codegen_aie_mlir(mod, orig_input_args, mapping): | |||
code += format_str("%tile_shim = aie.tile(0, 0)") | |||
for mid in range(mem_tile_size): | |||
code += format_str(f"%tile_mem{mid} = aie.tile({mid}, 1)") | |||
assert len(mapping) == 1, "Only support 1D mapping for now" | |||
pe_size = mapping[0] | |||
# assert len(mapping) == 1, "Only support 1D mapping for now" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it support 2D now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. I think I will support it in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you removed the mapping argument, you should also remove the assertion here
allo/backend/aie.py
Outdated
code += format_str( | ||
f"aie.objectfifo @in_sh{i}(%tile_shim, {{%tile_mem{i}}}, 2 : i32) : !aie.objectfifo<{orig_in_type}>" | ||
) | ||
linkings = [False] * len(input_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment for linkings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your description, a clearer name might be dist_alloc
, which reflects that when set to True the memory is distributed among compute tiles, and when False it is replicated to each tile.
allo/backend/aie.py
Outdated
code += format_str("%c1000 = arith.constant 0 : index") | ||
code += format_str("%c1001 = arith.constant 1 : index") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using 1000 and 1001?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there is going to be some conflict when using c0 and c1 when c0 and c1 are also used in the actual computation. So, I chose two large numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give a better name then. Probably %global_c0
and %global_c1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some naming issues
allo/backend/aie.py
Outdated
code += format_str("%c1000 = arith.constant 0 : index") | ||
code += format_str("%c1001 = arith.constant 1 : index") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give a better name then. Probably %global_c0
and %global_c1
allo/backend/aie.py
Outdated
code += format_str( | ||
f"aie.objectfifo @in_sh{i}(%tile_shim, {{%tile_mem{i}}}, 2 : i32) : !aie.objectfifo<{orig_in_type}>" | ||
) | ||
linkings = [False] * len(input_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your description, a clearer name might be dist_alloc
, which reflects that when set to True the memory is distributed among compute tiles, and when False it is replicated to each tile.
allo/backend/aie.py
Outdated
@@ -221,86 +237,153 @@ def codegen_aie_mlir(mod, orig_input_args, mapping): | |||
code += format_str("%tile_shim = aie.tile(0, 0)") | |||
for mid in range(mem_tile_size): | |||
code += format_str(f"%tile_mem{mid} = aie.tile({mid}, 1)") | |||
assert len(mapping) == 1, "Only support 1D mapping for now" | |||
pe_size = mapping[0] | |||
# assert len(mapping) == 1, "Only support 1D mapping for now" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you removed the mapping argument, you should also remove the assertion here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thx!
Description
The original AIE backend with dataflow interface only support simple element-wise application, like vector add. It used local indexing, which is insufficient for more complex tensor access pattern, like matrix multiplication.
Problems
In the original AIE interface, the compiler lacks sufficient information to determine which dimension (M, K, or N) should be used for tiling in this GEMM implementation.
Proposed Solutions
In the new interface, we adopt global indexing similar to the HLS backend for AIE. Additionally, we leverage tensor slicing access, inspired by Triton, to enable the compiler to determine the precise access range for each tensor, facilitating backend reindexing. Furthermore, we use primitives like allo.matmul to encapsulate fundamental operations, streamlining MLIR construction and lowering processes.
Checklist