Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove LoopVectorization dependency, in light of deprecation #48

Merged
merged 8 commits into from
Jun 13, 2024

Conversation

brenhinkeller
Copy link
Owner

@brenhinkeller brenhinkeller commented Jun 13, 2024

Using @simd ivdep for seems to allow for reasonable vectorization here

@brenhinkeller
Copy link
Owner Author

brenhinkeller commented Jun 13, 2024

Some timing comparisons, Julia 1.10.4 on Apple M1 Max...
1d, with LV:

julia> @benchmark nanmean(a) setup=(a=rand(1000); a[rand(1:1000, 100)].=NaN)
BenchmarkTools.Trial: 10000 samples with 417 evaluations.
 Range (min … max):  239.307 ns … 478.916 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     239.710 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   243.649 ns ±  10.212 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄▁▂▂▂▁▂▃▂   ▁▁▁▁▁▁                                           ▁
  ███████████████████▇█▇▆▆▅▅▅▅▅▆▅▅▅▆▅▆▅▅▆▆▆▆▅▆▆▆▅▆▅▅▃▅▄▅▅▃▃▃▅▃▄ █
  239 ns        Histogram: log(frequency) by time        291 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark nanmean(a) setup=(a=rand(100); a[rand(1:100, 10)].=NaN)
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min … max):  25.309 ns … 72.540 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     25.435 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.638 ns ±  1.307 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆█▇▅▅▂▁▁▂▁▁ ▁   ▂                                           ▂
  █████████████▇▇▇███▇▇█▆▅▅▇▆▇▆▇▆▇▇█▇▇▇█▇▇▇▇▆▆▅▅▅▆▄▁▄▄▃▃▆▅▅▄▅ █
  25.3 ns      Histogram: log(frequency) by time      28.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark nanmean(a) setup=(a=rand(10); a[rand(1:10, 1)].=NaN)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  5.166 ns … 25.000 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.291 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.310 ns ±  0.512 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▅        ▁█
  ▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▃ ▂
  5.17 ns        Histogram: frequency by time        5.42 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

1d, after this PR

julia> @benchmark nanmean(a) setup=(a=rand(1000); a[rand(1:1000, 100)].=NaN)
BenchmarkTools.Trial: 10000 samples with 438 evaluations.
 Range (min … max):  238.584 ns … 477.169 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     238.966 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   243.414 ns ±  11.017 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃▂▂▁▂▁▃▃▁ ▁▁▁ ▁                                              ▁
  ████████████████████▇▇▆▆▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▇▆▆▅▅▆▅▆▅▅▅▄▅▅▄▄▄▆▃▅▅ █
  239 ns        Histogram: log(frequency) by time        293 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark nanmean(a) setup=(a=rand(100); a[rand(1:100, 10)].=NaN)
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min … max):  24.991 ns … 67.537 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     25.117 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.500 ns ±  1.768 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄▁▁▂▃▁  ▁▁▂                                                ▁
  █████████████▇▇▃▆▆▁▁▄▄▄▄▄▅▃▄▅▅▅▄▅▅▃▅▅▄▄▄▄▅▁▃▃▄▅▄▄▅▄▄▄▅▅▄▅▄▆ █
  25 ns        Histogram: log(frequency) by time      35.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark nanmean(a) setup=(a=rand(10); a[rand(1:10, 1)].=NaN)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  4.250 ns … 15.042 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.334 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.365 ns ±  0.210 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▃       █       ▇       ▂▃       ▃       ▂         ▁
  ▅▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁██▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▇ █
  4.25 ns      Histogram: log(frequency) by time     4.54 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@brenhinkeller
Copy link
Owner Author

nanmean, multidimensional, with LV:

julia> @benchmark nanmean(a, dims=1) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 3450 samples with 1 evaluation.
 Range (min … max):  245.000 μs … 308.083 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     251.416 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   252.741 μs ±   5.494 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▂▃▄▇██▆▅▃▂▂
  ▁▂▂▃▆▇████████████▆▆▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  245 μs           Histogram: frequency by time          276 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=2) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 3121 samples with 1 evaluation.
 Range (min … max):  384.125 μs … 564.750 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     388.458 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   391.569 μs ±  12.831 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▆█▇▆▆▅▅▄▃▂▂▁▁ ▁▁                                             ▁
  ███████████████████▆▇█▇▆▇▅▆▆▆▆▃▄▃▅▃▃▅▃▅▅▁▆▃▃▃▃▃▁▃▁▄▁▄▁▃▁▃▄▃▄▅ █
  384 μs        Histogram: log(frequency) by time        450 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=2) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.500 μs …   6.690 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.662 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.721 μs ± 230.471 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▂▅▇█▆▅▆▆▄▄▃▃▂▁▁                                          ▂
  ▄▆▇██████████████████▇█████▇▆▆▆▆▅▅▃▄▅▅▅▆▄▅▄▅▄▄▅▄▄▅▅▅▁▅▃▄▁▃▅ █
  2.5 μs       Histogram: log(frequency) by time      3.88 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=1) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.463 μs …   5.949 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.486 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.534 μs ± 183.970 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▅▂ ▄▆▅▃▁ ▁                                                ▂
  ████▇████████▇▇▆▅▅▆▅▆▆▆▆▇█▆█▇▆▄▃▄▅▄▄▃▃▃▃▃▃▂▃▂▂▃▃▃▃▃▄▂▃▂▄▂▄▄ █
  2.46 μs      Histogram: log(frequency) by time      3.36 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=1) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 981 evaluations.
 Range (min … max):  61.119 ns …  95.335 μs  ┊ GC (min … max):  0.00% … 99.89%
 Time  (median):     76.919 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   88.931 ns ± 954.348 ns  ┊ GC (mean ± σ):  11.57% ±  1.95%

              ▁▂     ▁▄▅▄▄██▇▄▃▃▂▁▁
  ▂▂▁▁▁▁▂▂▂▂▃▇██▄▃▃▃▆███████████████▇▆▆▆▆▅▅▄▄▃▄▃▃▃▃▃▂▃▂▂▂▂▂▂▂▂ ▄
  61.1 ns         Histogram: frequency by time         97.1 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=2) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 978 evaluations.
 Range (min … max):  60.796 ns …  91.711 μs  ┊ GC (min … max):  0.00% … 99.89%
 Time  (median):     77.837 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   88.544 ns ± 917.416 ns  ┊ GC (mean ± σ):  11.21% ±  1.95%

                      ▃▅█▆▇▇▆▃▃▂
  ▂▃▂▂▂▂▂▂▂▂▁▂▆▇▅█▃▃▅████████████▇▇▆▅▄▄▄▄▃▃▃▃▃▃▃▃▂▂▃▂▂▂▂▂▂▂▂▂▂ ▄
  60.8 ns         Histogram: frequency by time          102 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

nanmean, multidimensional, after this PR

julia> @benchmark nanmean(a, dims=1) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 3576 samples with 1 evaluation.
 Range (min … max):  238.000 μs … 341.000 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     238.667 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   241.594 μs ±   6.500 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▂▁▁▃▅▃ ▄▂▁▁▁▁▁▁                                             ▁
  █████████████████▇▇█▇▇▆▇▆▆▆▅▄▅▅▅▅▆▆▆▂▆▆▅▆▅▇▅▆▆▄▆▃▅▂▄▄▅▅▅▂▄▃▅▄ █
  238 μs        Histogram: log(frequency) by time        268 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=2) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 2345 samples with 1 evaluation.
 Range (min … max):  953.291 μs …  1.125 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     960.125 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   965.405 μs ± 16.487 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄█▆▇▇▆▇▅▄▄▂▂ ▂▁▂▁▁▁                                          ▁
  ███████████████████████▇▆▇▆▆▇▇▅▆▇▅▆▅▆▆▅▇▆▇▆▅▆▆▆▆▆▄▅▅▄▅▆▅▄▅▃▅ █
  953 μs        Histogram: log(frequency) by time      1.03 ms <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=1) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.431 μs …  4.491 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.463 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.473 μs ± 89.275 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁ ▅▇ █ ▇▇ ▇ █▇ ▅ ▄▃ ▂ ▂▁ ▁ ▁▁ ▂ ▃▂ ▂ ▂▂ ▂ ▁▁              ▃
  ▇█▁██▁█▁██▁█▁██▁█▁██▁█▁██▁█▁██▁█▁██▁█▁██▁█▁██▁█▁█▆▁▅▁▆▅▁▅▄ █
  2.43 μs      Histogram: log(frequency) by time     2.59 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=2) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  6.392 μs …  12.442 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.608 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.693 μs ± 413.635 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▃██▅▃▃▄▃▂▂▁▁   ▁                                         ▂
  ▅▃▆█████████████▇███▆▆▄▅▆▆▅▅▅▅▅▄▄▄▃▁▄▄▆▅▅▆▅▆▅▄▄▁▅▄▆▅▅▄▄▅▄▄▄ █
  6.39 μs      Histogram: log(frequency) by time      8.81 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=1) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 985 evaluations.
 Range (min … max):  55.076 ns …  55.824 μs  ┊ GC (min … max): 0.00% … 99.84%
 Time  (median):     61.083 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   67.818 ns ± 557.710 ns  ┊ GC (mean ± σ):  8.60% ±  2.87%

    █▇
  ▂▅██▅▄▄▄▃▃▃▄▃▄▃▃▃▄▅▆▅▅▅▅▅▅▄▄▄▃▃▃▃▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂ ▃
  55.1 ns         Histogram: frequency by time         82.4 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=2) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 982 evaluations.
 Range (min … max):  63.305 ns …  52.189 μs  ┊ GC (min … max): 0.00% … 99.82%
 Time  (median):     69.374 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   75.238 ns ± 521.263 ns  ┊ GC (mean ± σ):  7.25% ±  2.75%

      █▅
  ▁▁▂▇██▄▂▂▃▃▂▂▂▂▄▄▆▃▃▃▃▃▄▅▄▃▃▂▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  63.3 ns         Histogram: frequency by time         86.3 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

@brenhinkeller
Copy link
Owner Author

nanstd, multidimensional, with LV:

julia> @benchmark nanstd(a, dims=1) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 2835 samples with 1 evaluation.
 Range (min … max):  559.334 μs … 799.750 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     571.166 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   575.298 μs ±  14.169 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▁██▇▇▅▄▂▁
  ▁▁▄▆█████████▇▇▆▆▅▃▃▃▃▂▃▃▂▃▂▂▂▂▂▁▁▁▁▂▁▂▁▁▁▂▁▁▁▂▁▁▁▁▂▁▁▁▁▁▁▁▁▁ ▃
  559 μs           Histogram: frequency by time          632 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=2) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 2539 samples with 1 evaluation.
 Range (min … max):  752.542 μs … 989.000 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     790.958 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   795.325 μs ±  24.537 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▁▂▅▆▆▅██▇▅▆▅▇▄▅▃ ▂▂▁
  ▁▂▂▄▄▆▇████████████████▇████▇▅▅▅▅▆▄▃▄▃▂▂▃▂▃▃▁▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁ ▄
  753 μs           Histogram: frequency by time          882 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=1) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.653 μs … 206.597 μs  ┊ GC (min … max): 0.00% … 95.98%
 Time  (median):     5.688 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.742 μs ±   2.019 μs  ┊ GC (mean ± σ):  0.35% ±  0.96%

  ▁▇█▇▅▂ ▁▁▁▁   ▁▂▂▁▁                                         ▂
  ██████▇██████▇█████▅▇▇▇▇▅▅▄▃▄▁▅▃▄▁▃▄▆█▆▄▅▅▃▃▄▁▄▄▄▄▄▄▁▃▃▄▄▅▅ █
  5.65 μs      Histogram: log(frequency) by time      6.41 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=2) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.667 μs … 211.153 μs  ┊ GC (min … max): 0.00% … 95.97%
 Time  (median):     5.986 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.035 μs ±   2.059 μs  ┊ GC (mean ± σ):  0.34% ±  0.96%

                    ▄▆█▇▄
  ▂▂▂▂▁▁▁▁▁▁▂▂▂▂▃▃▄▅█████▅▃▃▃▃▂▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  5.67 μs         Histogram: frequency by time        6.58 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=1) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 946 evaluations.
 Range (min … max):   98.396 ns …  52.387 μs  ┊ GC (min … max): 0.00% … 99.73%
 Time  (median):     104.387 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   110.113 ns ± 522.884 ns  ┊ GC (mean ± σ):  4.93% ±  2.30%

    █▇ ▁
  ▁▃████▂▂▂▂▃▂▂▅▃▇▅▅▃▄▄▄▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  98.4 ns          Histogram: frequency by time          123 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=2) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 952 evaluations.
 Range (min … max):   95.194 ns …  59.095 μs  ┊ GC (min … max): 0.00% … 99.78%
 Time  (median):     101.847 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   108.089 ns ± 590.040 ns  ┊ GC (mean ± σ):  5.65% ±  2.39%

    ▃█
  ▁▂██▂▂▂▂▂▁▂▂▁▁▁▂▂▂▂▂▂▃▃▄▃▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  95.2 ns          Histogram: frequency by time          118 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

nanstd, multidimensional, after this PR:

julia> @benchmark nanstd(a, dims=1) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 2776 samples with 1 evaluation.
 Range (min … max):  554.250 μs … 843.750 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     561.875 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   565.629 μs ±  13.338 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▅▄█▅▂▃▄▁
  ▇▆████████▇█▆▄▅▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▁▂▁▂▁▁▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  554 μs           Histogram: frequency by time          611 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=2) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 1608 samples with 1 evaluation.
 Range (min … max):  1.867 ms …  2.238 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.891 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.894 ms ± 19.374 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▁▂▅▅▄█▄▅▇▅▆▁
  ▃▃▃▄▆▇██████████████▇▆▇▆▅▅▃▄▄▃▃▃▃▃▂▃▂▂▃▂▂▃▃▂▂▂▂▂▂▁▁▂▁▁▂▂▂▂ ▄
  1.87 ms        Histogram: frequency by time        1.96 ms <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=1) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.702 μs … 318.250 μs  ┊ GC (min … max): 0.00% … 97.36%
 Time  (median):     5.743 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.811 μs ±   3.129 μs  ┊ GC (mean ± σ):  0.53% ±  0.97%

  ▁▆█▆▇▆▄▁▁ ▁ ▁▂▂ ▂▃▃▁▂▁▁                                     ▂
  ███████████▇███████████▇▇▆▆▄▄▄▅▁▃▅▅▃▅▄▅▆█▇▇▃▄▅▄▁▃▃▅▄▄▄▅▄▅▁▅ █
  5.7 μs       Histogram: log(frequency) by time      6.42 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=2) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  12.833 μs …  26.625 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     13.958 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.995 μs ± 290.298 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                ▂▆ █▇▄
  ▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▂▂▁▂▁▂▂▂▂▂▃▁▄▇██▁███▄▇▆▅▄▁▃▃▃▃▁▃▃▃▁▃▂▂▂▁▂▂▂▂ ▃
  12.8 μs         Histogram: frequency by time         14.8 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=1) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 934 evaluations.
 Range (min … max):  107.468 ns …  52.664 μs  ┊ GC (min … max): 0.00% … 99.73%
 Time  (median):     113.134 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   118.891 ns ± 525.570 ns  ┊ GC (mean ± σ):  4.59% ±  2.22%

   ▄█▆▂▂▂▂▂▃▃▂▃▄▄▄▄▄▄▄▄▄▄▃▃▂▂▂▂▂▁▁▁▁▁                           ▂
  ███████████████████████████████████████▇▇▇▆▇▇▅▅▅▆▆▅▅▄▅▅▅▅▄▄▄▅ █
  107 ns        Histogram: log(frequency) by time        134 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=2) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 903 evaluations.
 Range (min … max):  123.846 ns …  58.171 μs  ┊ GC (min … max): 0.00% … 99.73%
 Time  (median):     130.907 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   138.251 ns ± 580.488 ns  ┊ GC (mean ± σ):  4.37% ±  2.23%

   ▆█
  ▁██▅▃▃▃▄▄▆▅▆▆▅▆▅▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  124 ns           Histogram: frequency by time          164 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

@brenhinkeller
Copy link
Owner Author

While in other cases I've seen much larger drops in performance when removing @turbo, these don't seem too bad. Mostly we see that we lose the benefit of LV's loop reordering, which previously limited the difference between column- vs row- based reductions (e.g. dims=1 vs dims=2).

Closes #47

@brenhinkeller brenhinkeller merged commit d9e60e1 into main Jun 13, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant