Skip to content

Commit

Permalink
Fixed an issue in the auto optimizer.
Browse files Browse the repository at this point in the history
The problem was that in the auto optimizer the "set stride correctly" function was called after the "go to GPU transformation".
The GPU transformation will call `CopyToMap` transformation, this is needed such that we can set the GPU block size and their order.
However, the desicion if we call `CopyToMap` also depends on teh strides, so we need to handle them.

But there is still a problem, but it is a prblem with the strides.
  • Loading branch information
philip-paul-mueller committed Feb 25, 2025
1 parent 1984691 commit 2d5ad5d
Showing 1 changed file with 11 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,11 @@ def gt_auto_optimize(
one with stride one.
5. If requested the function will now apply loop blocking, on the dimension
indicated by `leading_dim`.
6. If requested the SDFG will be transformed to GPU. For this the
6. The strides of temporaries are set to match the compute order..
7. If requested the SDFG will be transformed to GPU. For this the
`gt_gpu_transformation()` function is used, that might apply several other
optimizations.
7. Afterwards some general transformations to the SDFG are applied.
8. Afterwards some general transformations to the SDFG are applied.
This includes:
- Use fast implementation for library nodes.
- Move small transients to stack.
Expand Down Expand Up @@ -235,7 +236,13 @@ def gt_auto_optimize(
validate_all=validate_all,
)

# Phase 6: Going to GPU
# Phase 6: Setting the strides of transients
# It is important that we set the strides before the GPU transformation.
# Because this transformation will also apply `CopyToMap` for the Memlets
# that the DaCe runtime can not handle.
gtx_transformations.gt_change_transient_strides(sdfg, gpu=gpu)

# Phase 7: Going to GPU
if gpu:
# TODO(phimuell): The GPU function might modify the map iteration order.
# This is because how it is implemented (promotion and
Expand All @@ -251,7 +258,7 @@ def gt_auto_optimize(
try_removing_trivial_maps=True,
)

# Phase 7: General Optimizations
# Phase 8: General Optimizations
# The following operations apply regardless if we have a GPU or CPU.
# The DaCe auto optimizer also uses them. Note that the reuse transient
# is not done by DaCe.
Expand All @@ -267,9 +274,6 @@ def gt_auto_optimize(
# TODO(phimuell): Fix the bug, it uses the tile value and not the stack array value.
dace_aoptimize.move_small_arrays_to_stack(sdfg)

# Now we modify the strides.
gtx_transformations.gt_change_transient_strides(sdfg, gpu=gpu)

if make_persistent:
gtx_transformations.gt_make_transients_persistent(sdfg=sdfg, device=device)

Expand Down

0 comments on commit 2d5ad5d

Please sign in to comment.