-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Do not rewrite or approximate math functions on ROCm #19970
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
4ae6121
to
35e4441
Compare
If CI passes that's a good indication for e2e correctness for now I think |
Interesting. It has compilation failures |
// CHECK: math.exp2 | ||
// CHECK: math.expm1 | ||
// CHECK: math.cbrt | ||
// CHECK: math.erf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we figured out the numerical issue with math.erf
library function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have (99%) figured that there was no issue with it, and the issues we ran into were caused by PolynomialApproximationPass being too coarse-grained and too convoluted, so that when we thought earlier that we were enabling/disabling math.erf approximation, we were also enabling/disabling a number of other things, unintentionally. This is what #19922 solved. Now in the present PR we are finally at a place where we have some fine-grained, well-defined levers to play with.
On ROCm, we want to use the device library for all math functions.
This expands on #19969, which only concerned
math.erf
.We only leave one category of rewrites enabled: the operand casts to f32. The ROCm device library internally performs the same for many math functions, but we leave that unchanged here for safety and because we seemed to have accuracy issues with an earlier attempt that was dropping that.
This PR needs careful accuracy testing on end-to-end workloads before merging.