Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow opt-out of implicit bounds-checking #563

Open
wants to merge 2 commits into
base: vc/pocl
Choose a base branch
from

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Feb 7, 2025

KernelAbstractions currently creates kernels that look like:

if __validindex(ctx)
   # Body
end

This is problematic due to the convergence requirement on
@synchronize.

Copy link
Member Author

vchuravy commented Feb 7, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@vchuravy vchuravy marked this pull request as ready for review February 7, 2025 11:31
Copy link

codecov bot commented Feb 7, 2025

Codecov Report

Attention: Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (f038d8c) to head (4dd0acc).

Files with missing lines Patch % Lines
src/macros.jl 0.00% 10 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           vc/pocl    #563   +/-   ##
=======================================
  Coverage     0.00%   0.00%           
=======================================
  Files           21      21           
  Lines         1509    1513    +4     
=======================================
- Misses        1509    1513    +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Feb 7, 2025

Benchmark Results

main 4dd0acc... main/4dd0accda3764a...
saxpy/default/Float16/1024 0.731 ± 0.006 μs 0.0553 ± 0.026 ms 0.0132
saxpy/default/Float16/1048576 0.174 ± 0.0017 ms 0.887 ± 0.023 ms 0.196
saxpy/default/Float16/16384 3.34 ± 0.048 μs 0.0615 ± 0.027 ms 0.0544
saxpy/default/Float16/2048 0.903 ± 0.0094 μs 0.0514 ± 0.023 ms 0.0176
saxpy/default/Float16/256 0.587 ± 0.0044 μs 0.0579 ± 0.027 ms 0.0101
saxpy/default/Float16/262144 0.0442 ± 0.00058 ms 0.27 ± 0.026 ms 0.164
saxpy/default/Float16/32768 6.02 ± 0.092 μs 0.0741 ± 0.027 ms 0.0812
saxpy/default/Float16/4096 1.31 ± 0.023 μs 0.0576 ± 0.025 ms 0.0227
saxpy/default/Float16/512 0.647 ± 0.0052 μs 0.0559 ± 0.026 ms 0.0116
saxpy/default/Float16/64 0.556 ± 0.0041 μs 0.0621 ± 0.026 ms 0.00896
saxpy/default/Float16/65536 11.7 ± 0.14 μs 0.103 ± 0.027 ms 0.114
saxpy/default/Float32/1024 0.628 ± 0.008 μs 0.0565 ± 0.026 ms 0.0111
saxpy/default/Float32/1048576 0.23 ± 0.0099 ms 0.469 ± 0.038 ms 0.491
saxpy/default/Float32/16384 2.96 ± 0.46 μs 0.0543 ± 0.024 ms 0.0545
saxpy/default/Float32/2048 0.752 ± 0.028 μs 0.049 ± 0.023 ms 0.0153
saxpy/default/Float32/256 0.567 ± 0.0048 μs 0.0561 ± 0.027 ms 0.0101
saxpy/default/Float32/262144 0.0577 ± 0.0025 ms 0.158 ± 0.035 ms 0.364
saxpy/default/Float32/32768 5.64 ± 0.72 μs 0.0598 ± 0.026 ms 0.0944
saxpy/default/Float32/4096 1.14 ± 0.12 μs 0.0529 ± 0.025 ms 0.0215
saxpy/default/Float32/512 0.6 ± 0.0063 μs 0.0573 ± 0.027 ms 0.0105
saxpy/default/Float32/64 0.554 ± 0.0049 μs 0.0566 ± 0.027 ms 0.00979
saxpy/default/Float32/65536 12.7 ± 0.93 μs 0.0744 ± 0.029 ms 0.17
saxpy/default/Float64/1024 0.747 ± 0.018 μs 0.0559 ± 0.026 ms 0.0134
saxpy/default/Float64/1048576 0.484 ± 0.017 ms 0.5 ± 0.047 ms 0.969
saxpy/default/Float64/16384 5.41 ± 0.51 μs 0.0558 ± 0.025 ms 0.0971
saxpy/default/Float64/2048 1.16 ± 0.14 μs 0.0497 ± 0.024 ms 0.0234
saxpy/default/Float64/256 0.586 ± 0.0058 μs 0.0592 ± 0.026 ms 0.00989
saxpy/default/Float64/262144 0.115 ± 0.0065 ms 0.171 ± 0.03 ms 0.669
saxpy/default/Float64/32768 12.5 ± 0.67 μs 0.0633 ± 0.025 ms 0.198
saxpy/default/Float64/4096 1.78 ± 0.26 μs 0.0546 ± 0.025 ms 0.0326
saxpy/default/Float64/512 0.635 ± 0.0079 μs 0.0523 ± 0.027 ms 0.0121
saxpy/default/Float64/64 0.564 ± 0.0043 μs 0.0596 ± 0.027 ms 0.00947
saxpy/default/Float64/65536 28.6 ± 1.4 μs 0.0835 ± 0.026 ms 0.342
saxpy/static workgroup=(1024,)/Float16/1024 2.18 ± 0.028 μs 0.0479 ± 0.026 ms 0.0456
saxpy/static workgroup=(1024,)/Float16/1048576 0.159 ± 0.0031 ms 0.898 ± 0.028 ms 0.177
saxpy/static workgroup=(1024,)/Float16/16384 4.43 ± 0.081 μs 0.0584 ± 0.025 ms 0.0758
saxpy/static workgroup=(1024,)/Float16/2048 2.36 ± 0.027 μs 0.0567 ± 0.023 ms 0.0416
saxpy/static workgroup=(1024,)/Float16/256 2.81 ± 0.037 μs 0.047 ± 0.026 ms 0.0599
saxpy/static workgroup=(1024,)/Float16/262144 0.0423 ± 0.00084 ms 0.268 ± 0.028 ms 0.158
saxpy/static workgroup=(1024,)/Float16/32768 6.87 ± 0.19 μs 0.0719 ± 0.025 ms 0.0956
saxpy/static workgroup=(1024,)/Float16/4096 2.68 ± 0.04 μs 0.0499 ± 0.026 ms 0.0538
saxpy/static workgroup=(1024,)/Float16/512 3.25 ± 0.033 μs 0.0489 ± 0.026 ms 0.0666
saxpy/static workgroup=(1024,)/Float16/64 2.51 ± 0.21 μs 0.05 ± 0.027 ms 0.0502
saxpy/static workgroup=(1024,)/Float16/65536 12.7 ± 0.33 μs 0.0994 ± 0.025 ms 0.127
saxpy/static workgroup=(1024,)/Float32/1024 2.23 ± 0.028 μs 0.0534 ± 0.026 ms 0.0418
saxpy/static workgroup=(1024,)/Float32/1048576 0.234 ± 0.02 ms 0.457 ± 0.027 ms 0.511
saxpy/static workgroup=(1024,)/Float32/16384 4.39 ± 0.26 μs 0.0518 ± 0.024 ms 0.0848
saxpy/static workgroup=(1024,)/Float32/2048 2.4 ± 0.06 μs 0.0467 ± 0.023 ms 0.0513
saxpy/static workgroup=(1024,)/Float32/256 2.69 ± 0.043 μs 0.0559 ± 0.026 ms 0.0481
saxpy/static workgroup=(1024,)/Float32/262144 0.0606 ± 0.0031 ms 0.153 ± 0.034 ms 0.395
saxpy/static workgroup=(1024,)/Float32/32768 7.59 ± 0.4 μs 0.0573 ± 0.025 ms 0.133
saxpy/static workgroup=(1024,)/Float32/4096 2.68 ± 0.082 μs 0.0486 ± 0.026 ms 0.055
saxpy/static workgroup=(1024,)/Float32/512 2.71 ± 0.028 μs 0.0564 ± 0.026 ms 0.048
saxpy/static workgroup=(1024,)/Float32/64 2.72 ± 5.2 μs 0.0557 ± 0.026 ms 0.0487
saxpy/static workgroup=(1024,)/Float32/65536 15.5 ± 0.85 μs 0.0709 ± 0.028 ms 0.219
saxpy/static workgroup=(1024,)/Float64/1024 2.32 ± 0.07 μs 0.0529 ± 0.026 ms 0.0439
saxpy/static workgroup=(1024,)/Float64/1048576 0.501 ± 0.021 ms 0.496 ± 0.05 ms 1.01
saxpy/static workgroup=(1024,)/Float64/16384 7.31 ± 0.32 μs 0.0527 ± 0.025 ms 0.139
saxpy/static workgroup=(1024,)/Float64/2048 2.61 ± 0.076 μs 0.0437 ± 0.022 ms 0.0596
saxpy/static workgroup=(1024,)/Float64/256 2.64 ± 0.055 μs 0.0557 ± 0.026 ms 0.0473
saxpy/static workgroup=(1024,)/Float64/262144 0.117 ± 0.0054 ms 0.166 ± 0.031 ms 0.706
saxpy/static workgroup=(1024,)/Float64/32768 15.2 ± 0.86 μs 0.0607 ± 0.025 ms 0.251
saxpy/static workgroup=(1024,)/Float64/4096 3.16 ± 0.19 μs 0.0504 ± 0.026 ms 0.0628
saxpy/static workgroup=(1024,)/Float64/512 2.64 ± 0.06 μs 0.0418 ± 0.027 ms 0.0632
saxpy/static workgroup=(1024,)/Float64/64 2.6 ± 0.066 μs 0.0434 ± 0.027 ms 0.0598
saxpy/static workgroup=(1024,)/Float64/65536 31.3 ± 1.1 μs 0.08 ± 0.027 ms 0.391
time_to_load 0.315 ± 0.00096 s 1.1 ± 0.0043 s 0.286

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

KernelAbstractions currently creates kernels that look like:

```
if __validindex(ctx)
   # Body
end
```

This is problematic due to the convergence requirement on
`@synchronize`.
@vchuravy vchuravy force-pushed the 02-07-allow_opt-out_of_implicit_bounds-checking branch from e565304 to 4dd0acc Compare February 10, 2025 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant