-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to run 02-fused-softmax.py on a CPU? #199
Comments
Hi Andrzej, Thank you! |
Hi Renat, thanks for getting back to me so quickly and for the link. Yes, removing "perf" the perf part fixes the issue :) Let me close this and just follow what you did in #163. It would be nice to have it merged in :) -Andrzej |
…#209) Since we cannot use standard triton benchmarks as brought up here: #199 because they are specific to GPU. Sample output: ```sh $ python test_softmax.py bench_softmax(1024, 'torch') {}, 20 times, all results in seconds Wall: Avg=0.006537, min=0.005301, std=0.000326, max=0.006723 CPU: Avg=0.123649, min=0.010989, std=0.026653, max=0.140211 bench_softmax(1024, 'triton') {}, 20 times, all results in seconds Wall: Avg=0.102619, min=0.014122, std=0.384826, max=1.780037 CPU: Avg=0.028643, min=0.014123, std=0.062372, max=0.300513 bench_softmax(2048, 'torch') {}, 20 times, all results in seconds Wall: Avg=0.015215, min=0.013364, std=0.002282, max=0.022841 CPU: Avg=0.172217, min=0.043525, std=0.037402, max=0.231176 bench_softmax(2048, 'triton') {}, 20 times, all results in seconds Wall: Avg=0.071460, min=0.055257, std=0.068684, max=0.370846 CPU: Avg=0.062689, min=0.055258, std=0.030449, max=0.195406 bench_softmax(4096, 'torch') {}, 20 times, all results in seconds Wall: Avg=0.056267, min=0.056117, std=0.000134, max=0.056681 CPU: Avg=0.313888, min=0.220500, std=0.023960, max=0.338866 bench_softmax(4096, 'triton') {}, 20 times, all results in seconds Wall: Avg=0.258867, min=0.244147, std=0.062352, max=0.530646 CPU: Avg=0.249397, min=0.244141, std=0.021087, max=0.341300 ``` --------- Co-authored-by: Renat Idrisov <parsifal-47@users.noreply.github.com>
Hi folks,
First off, thank you for triton-shared—it's fantastic work!
I’ve successfully built the project and run the examples on my AArch64 machine. However, I’m running into an issue with 02-fused-softmax.py:
Clearly, something is trying to target CUDA, but I’m not sure what. I modified the example to explicitly select the CPU backend:
Unfortunately, I’m still getting the same error, so I suspect my changes might not be sufficient. I’m very new to this and probably missing something obvious - please bear with me 😅
Any guidance would be greatly appreciated. Let me know if you need additional logs or details from my setup!
Thanks,
Andrzej
The text was updated successfully, but these errors were encountered: