Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stream] Enable batch affinity queries in SpecializeEncoding pass. #19975

Conversation

hanhanW
Copy link
Contributor

@hanhanW hanhanW commented Feb 12, 2025

The returned function (i.e., ResolveLayoutAttrFn) can be very inefficient because there could be other data-flow analysis in a run. The revision updates the ResolveLayoutAttrFn API. Now it accepts a list of query, and it stores the results to the map of SetVector<Attribute>.

In the encoding specialization pass, it introduces StreamTensorOpUpdater class. There are two phases in the updater. The class caches all the queries in init(), and updates all the encodings in run(). The init method is introduced because there could be a failure in the initialization. In this context, we do not put them to the constructor because we can not signal the error in constructors. See https://google.github.io/styleguide/cppguide.html#Doing_Work_in_Constructors

The pass gets 440x speed-up for one of SDXL compilation.

The lit test configuration change (i.e., --pass-pipeline='builtin.module(iree-stream-specialize-encodings)') is needed because we want to validate failures for unsupported encodings.

The returned function (i.e., ResolveLayoutAttrFn) can be very
inefficient because there could be other data-flow analysis in a run.
The revision updates the ResolveLayoutAttrFn API. Now it accepts a list
of query, and it stores the results to the map of SetVector<Attribute>.

In the encoding specialization pass, it introduces
`StreamTensorOpUpdater` class. There are two phases in the updater.  The
class caches all the queries in `init()`, and updates all the encodings
in `run()`. The `init` method is introduced because there could be a
failure in the initialization. In this context, we do not put them to
the constructor because we can not signal the error in constructors.
See https://google.github.io/styleguide/cppguide.html#Doing_Work_in_Constructors

The pass gets 440x speedup for one of SDXL compilation.

Signed-off-by: hanhanW <hanhan0912@gmail.com>
@hanhanW hanhanW requested a review from benvanik as a code owner February 12, 2025 17:26
Copy link
Collaborator

@benvanik benvanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style nits and otherwise lgtm!

using ResolveLayoutAttrFn = std::function<LogicalResult(
AffinityAttr, Operation *, SetVector<Attribute> &)>;
ArrayRef<AffinityAndOpPair>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: you can add argument names here to help readability and intellisense (e.g. std::function<void(int a, int b, int c)>)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks!

// encodings. See StreamInterfaces.h for more details.
IREE::Stream::ResolveLayoutAttrFn resolveLayoutAttr;
};
} // namespace
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: balanced whitespace
(but you may already be in an anonymous namespace from line 36 and not need this at all - it's hard to spot because there's no whitespace after it and it blends in with the following function :)

Suggested change
} // namespace
} // namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with anonymous namespaces is that they naturally want to encourage indentation of their body, and they reduce locality of reference: if you see a random function definition in a C++ file, it is easy to see if it is marked static, but seeing if it is in an anonymous namespace requires scanning a big chunk of the file.

Because of this, we have a simple guideline: make anonymous namespaces as small as possible, and only use them for class declarations.

I agree that it's hard to spot, so I really like the LLVM style guide idea. Let me update the namespace to only scope these classes. The local methods already have static keywords, and they do not need to be within an anonymous namespace.

https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention that I fixed the namespace scope in the commit.

};
} // namespace

StreamTensorOpUpdater::StreamTensorOpUpdater(ModuleOp moduleOp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: it's useful to keep functions inline unless it helps readability when the class is defined in the same file - it's fewer lines of code to have this up there and also harder to lose things (you declare StreamTensorOpUpdater(); as well but never define it, which is harder to see because it's out-of-line)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: hanhanW <hanhan0912@gmail.com>
@hanhanW hanhanW enabled auto-merge (squash) February 12, 2025 18:06
@hanhanW hanhanW merged commit d11b876 into iree-org:main Feb 12, 2025
38 of 39 checks passed
@hanhanW hanhanW deleted the improve-specialize-encoding-pass-compilation-time-2 branch February 12, 2025 18:30
hanhanW added a commit to hanhanW/iree that referenced this pull request Feb 13, 2025
…ree-org#19975)

The returned function (i.e., `ResolveLayoutAttrFn`) can be very
inefficient because there could be other data-flow analysis in a run.
The revision updates the `ResolveLayoutAttrFn` API. Now it accepts a
list of query, and it stores the results to the map of
`SetVector<Attribute>`.

In the encoding specialization pass, it introduces
`StreamTensorOpUpdater` class. There are two phases in the updater. The
class caches all the queries in `init()`, and updates all the encodings
in `run()`. The `init` method is introduced because there could be a
failure in the initialization. In this context, we do not put them to
the constructor because we can not signal the error in constructors. See
https://google.github.io/styleguide/cppguide.html#Doing_Work_in_Constructors

The pass gets 440x speed-up for one of SDXL compilation.

The lit test configuration change (i.e.,
`--pass-pipeline='builtin.module(iree-stream-specialize-encodings)'`) is
needed because we want to validate failures for unsupported encodings.

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants