Remove LoopVectorization dependency, in light of deprecation #48

brenhinkeller · 2024-06-13T01:15:27Z

Using @simd ivdep for seems to allow for reasonable vectorization here

brenhinkeller · 2024-06-13T17:00:59Z

Some timing comparisons, Julia 1.10.4 on Apple M1 Max...
1d, with LV:

julia> @benchmark nanmean(a) setup=(a=rand(1000); a[rand(1:1000, 100)].=NaN)
BenchmarkTools.Trial: 10000 samples with 417 evaluations.
 Range (min … max):  239.307 ns … 478.916 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     239.710 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   243.649 ns ±  10.212 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄▁▂▂▂▁▂▃▂   ▁▁▁▁▁▁                                           ▁
  ███████████████████▇█▇▆▆▅▅▅▅▅▆▅▅▅▆▅▆▅▅▆▆▆▆▅▆▆▆▅▆▅▅▃▅▄▅▅▃▃▃▅▃▄ █
  239 ns        Histogram: log(frequency) by time        291 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark nanmean(a) setup=(a=rand(100); a[rand(1:100, 10)].=NaN)
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min … max):  25.309 ns … 72.540 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     25.435 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.638 ns ±  1.307 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆█▇▅▅▂▁▁▂▁▁ ▁   ▂                                           ▂
  █████████████▇▇▇███▇▇█▆▅▅▇▆▇▆▇▆▇▇█▇▇▇█▇▇▇▇▆▆▅▅▅▆▄▁▄▄▃▃▆▅▅▄▅ █
  25.3 ns      Histogram: log(frequency) by time      28.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark nanmean(a) setup=(a=rand(10); a[rand(1:10, 1)].=NaN)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  5.166 ns … 25.000 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.291 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.310 ns ±  0.512 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▅        ▁█
  ▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▃ ▂
  5.17 ns        Histogram: frequency by time        5.42 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

1d, after this PR

julia> @benchmark nanmean(a) setup=(a=rand(1000); a[rand(1:1000, 100)].=NaN)
BenchmarkTools.Trial: 10000 samples with 438 evaluations.
 Range (min … max):  238.584 ns … 477.169 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     238.966 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   243.414 ns ±  11.017 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃▂▂▁▂▁▃▃▁ ▁▁▁ ▁                                              ▁
  ████████████████████▇▇▆▆▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▇▆▆▅▅▆▅▆▅▅▅▄▅▅▄▄▄▆▃▅▅ █
  239 ns        Histogram: log(frequency) by time        293 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark nanmean(a) setup=(a=rand(100); a[rand(1:100, 10)].=NaN)
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min … max):  24.991 ns … 67.537 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     25.117 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.500 ns ±  1.768 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄▁▁▂▃▁  ▁▁▂                                                ▁
  █████████████▇▇▃▆▆▁▁▄▄▄▄▄▅▃▄▅▅▅▄▅▅▃▅▅▄▄▄▄▅▁▃▃▄▅▄▄▅▄▄▄▅▅▄▅▄▆ █
  25 ns        Histogram: log(frequency) by time      35.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark nanmean(a) setup=(a=rand(10); a[rand(1:10, 1)].=NaN)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  4.250 ns … 15.042 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.334 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.365 ns ±  0.210 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▃       █       ▇       ▂▃       ▃       ▂         ▁
  ▅▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁██▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▇ █
  4.25 ns      Histogram: log(frequency) by time     4.54 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

brenhinkeller · 2024-06-13T17:04:07Z

nanmean, multidimensional, with LV:

julia> @benchmark nanmean(a, dims=1) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 3450 samples with 1 evaluation.
 Range (min … max):  245.000 μs … 308.083 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     251.416 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   252.741 μs ±   5.494 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▂▃▄▇██▆▅▃▂▂
  ▁▂▂▃▆▇████████████▆▆▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  245 μs           Histogram: frequency by time          276 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=2) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 3121 samples with 1 evaluation.
 Range (min … max):  384.125 μs … 564.750 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     388.458 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   391.569 μs ±  12.831 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▆█▇▆▆▅▅▄▃▂▂▁▁ ▁▁                                             ▁
  ███████████████████▆▇█▇▆▇▅▆▆▆▆▃▄▃▅▃▃▅▃▅▅▁▆▃▃▃▃▃▁▃▁▄▁▄▁▃▁▃▄▃▄▅ █
  384 μs        Histogram: log(frequency) by time        450 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=2) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.500 μs …   6.690 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.662 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.721 μs ± 230.471 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▂▅▇█▆▅▆▆▄▄▃▃▂▁▁                                          ▂
  ▄▆▇██████████████████▇█████▇▆▆▆▆▅▅▃▄▅▅▅▆▄▅▄▅▄▄▅▄▄▅▅▅▁▅▃▄▁▃▅ █
  2.5 μs       Histogram: log(frequency) by time      3.88 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=1) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.463 μs …   5.949 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.486 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.534 μs ± 183.970 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▅▂ ▄▆▅▃▁ ▁                                                ▂
  ████▇████████▇▇▆▅▅▆▅▆▆▆▆▇█▆█▇▆▄▃▄▅▄▄▃▃▃▃▃▃▂▃▂▂▃▃▃▃▃▄▂▃▂▄▂▄▄ █
  2.46 μs      Histogram: log(frequency) by time      3.36 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=1) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 981 evaluations.
 Range (min … max):  61.119 ns …  95.335 μs  ┊ GC (min … max):  0.00% … 99.89%
 Time  (median):     76.919 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   88.931 ns ± 954.348 ns  ┊ GC (mean ± σ):  11.57% ±  1.95%

              ▁▂     ▁▄▅▄▄██▇▄▃▃▂▁▁
  ▂▂▁▁▁▁▂▂▂▂▃▇██▄▃▃▃▆███████████████▇▆▆▆▆▅▅▄▄▃▄▃▃▃▃▃▂▃▂▂▂▂▂▂▂▂ ▄
  61.1 ns         Histogram: frequency by time         97.1 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=2) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 978 evaluations.
 Range (min … max):  60.796 ns …  91.711 μs  ┊ GC (min … max):  0.00% … 99.89%
 Time  (median):     77.837 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   88.544 ns ± 917.416 ns  ┊ GC (mean ± σ):  11.21% ±  1.95%

                      ▃▅█▆▇▇▆▃▃▂
  ▂▃▂▂▂▂▂▂▂▂▁▂▆▇▅█▃▃▅████████████▇▇▆▅▄▄▄▄▃▃▃▃▃▃▃▃▂▂▃▂▂▂▂▂▂▂▂▂▂ ▄
  60.8 ns         Histogram: frequency by time          102 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

nanmean, multidimensional, after this PR

julia> @benchmark nanmean(a, dims=1) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 3576 samples with 1 evaluation.
 Range (min … max):  238.000 μs … 341.000 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     238.667 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   241.594 μs ±   6.500 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▂▁▁▃▅▃ ▄▂▁▁▁▁▁▁                                             ▁
  █████████████████▇▇█▇▇▆▇▆▆▆▅▄▅▅▅▅▆▆▆▂▆▆▅▆▅▇▅▆▆▄▆▃▅▂▄▄▅▅▅▂▄▃▅▄ █
  238 μs        Histogram: log(frequency) by time        268 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=2) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 2345 samples with 1 evaluation.
 Range (min … max):  953.291 μs …  1.125 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     960.125 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   965.405 μs ± 16.487 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄█▆▇▇▆▇▅▄▄▂▂ ▂▁▂▁▁▁                                          ▁
  ███████████████████████▇▆▇▆▆▇▇▅▆▇▅▆▅▆▆▅▇▆▇▆▅▆▆▆▆▆▄▅▅▄▅▆▅▄▅▃▅ █
  953 μs        Histogram: log(frequency) by time      1.03 ms <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=1) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.431 μs …  4.491 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.463 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.473 μs ± 89.275 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁ ▅▇ █ ▇▇ ▇ █▇ ▅ ▄▃ ▂ ▂▁ ▁ ▁▁ ▂ ▃▂ ▂ ▂▂ ▂ ▁▁              ▃
  ▇█▁██▁█▁██▁█▁██▁█▁██▁█▁██▁█▁██▁█▁██▁█▁██▁█▁██▁█▁█▆▁▅▁▆▅▁▅▄ █
  2.43 μs      Histogram: log(frequency) by time     2.59 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=2) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  6.392 μs …  12.442 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.608 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.693 μs ± 413.635 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▃██▅▃▃▄▃▂▂▁▁   ▁                                         ▂
  ▅▃▆█████████████▇███▆▆▄▅▆▆▅▅▅▅▅▄▄▄▃▁▄▄▆▅▅▆▅▆▅▄▄▁▅▄▆▅▅▄▄▅▄▄▄ █
  6.39 μs      Histogram: log(frequency) by time      8.81 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=1) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 985 evaluations.
 Range (min … max):  55.076 ns …  55.824 μs  ┊ GC (min … max): 0.00% … 99.84%
 Time  (median):     61.083 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   67.818 ns ± 557.710 ns  ┊ GC (mean ± σ):  8.60% ±  2.87%

    █▇
  ▂▅██▅▄▄▄▃▃▃▄▃▄▃▃▃▄▅▆▅▅▅▅▅▅▄▄▄▃▃▃▃▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂ ▃
  55.1 ns         Histogram: frequency by time         82.4 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

julia> @benchmark nanmean(a, dims=2) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 982 evaluations.
 Range (min … max):  63.305 ns …  52.189 μs  ┊ GC (min … max): 0.00% … 99.82%
 Time  (median):     69.374 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   75.238 ns ± 521.263 ns  ┊ GC (mean ± σ):  7.25% ±  2.75%

      █▅
  ▁▁▂▇██▄▂▂▃▃▂▂▂▂▄▄▆▃▃▃▃▃▄▅▄▃▃▂▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  63.3 ns         Histogram: frequency by time         86.3 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

brenhinkeller · 2024-06-13T17:05:36Z

nanstd, multidimensional, with LV:

julia> @benchmark nanstd(a, dims=1) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 2835 samples with 1 evaluation.
 Range (min … max):  559.334 μs … 799.750 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     571.166 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   575.298 μs ±  14.169 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▁██▇▇▅▄▂▁
  ▁▁▄▆█████████▇▇▆▆▅▃▃▃▃▂▃▃▂▃▂▂▂▂▂▁▁▁▁▂▁▂▁▁▁▂▁▁▁▂▁▁▁▁▂▁▁▁▁▁▁▁▁▁ ▃
  559 μs           Histogram: frequency by time          632 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=2) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 2539 samples with 1 evaluation.
 Range (min … max):  752.542 μs … 989.000 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     790.958 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   795.325 μs ±  24.537 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▁▂▅▆▆▅██▇▅▆▅▇▄▅▃ ▂▂▁
  ▁▂▂▄▄▆▇████████████████▇████▇▅▅▅▅▆▄▃▄▃▂▂▃▂▃▃▁▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁ ▄
  753 μs           Histogram: frequency by time          882 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=1) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.653 μs … 206.597 μs  ┊ GC (min … max): 0.00% … 95.98%
 Time  (median):     5.688 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.742 μs ±   2.019 μs  ┊ GC (mean ± σ):  0.35% ±  0.96%

  ▁▇█▇▅▂ ▁▁▁▁   ▁▂▂▁▁                                         ▂
  ██████▇██████▇█████▅▇▇▇▇▅▅▄▃▄▁▅▃▄▁▃▄▆█▆▄▅▅▃▃▄▁▄▄▄▄▄▄▁▃▃▄▄▅▅ █
  5.65 μs      Histogram: log(frequency) by time      6.41 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=2) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.667 μs … 211.153 μs  ┊ GC (min … max): 0.00% … 95.97%
 Time  (median):     5.986 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.035 μs ±   2.059 μs  ┊ GC (mean ± σ):  0.34% ±  0.96%

                    ▄▆█▇▄
  ▂▂▂▂▁▁▁▁▁▁▂▂▂▂▃▃▄▅█████▅▃▃▃▃▂▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  5.67 μs         Histogram: frequency by time        6.58 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=1) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 946 evaluations.
 Range (min … max):   98.396 ns …  52.387 μs  ┊ GC (min … max): 0.00% … 99.73%
 Time  (median):     104.387 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   110.113 ns ± 522.884 ns  ┊ GC (mean ± σ):  4.93% ±  2.30%

    █▇ ▁
  ▁▃████▂▂▂▂▃▂▂▅▃▇▅▅▃▄▄▄▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  98.4 ns          Histogram: frequency by time          123 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=2) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 952 evaluations.
 Range (min … max):   95.194 ns …  59.095 μs  ┊ GC (min … max): 0.00% … 99.78%
 Time  (median):     101.847 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   108.089 ns ± 590.040 ns  ┊ GC (mean ± σ):  5.65% ±  2.39%

    ▃█
  ▁▂██▂▂▂▂▂▁▂▂▁▁▁▂▂▂▂▂▂▃▃▄▃▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  95.2 ns          Histogram: frequency by time          118 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

nanstd, multidimensional, after this PR:

julia> @benchmark nanstd(a, dims=1) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 2776 samples with 1 evaluation.
 Range (min … max):  554.250 μs … 843.750 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     561.875 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   565.629 μs ±  13.338 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▅▄█▅▂▃▄▁
  ▇▆████████▇█▆▄▅▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▁▂▁▂▁▁▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  554 μs           Histogram: frequency by time          611 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=2) setup=a=rand(1000,1000)
BenchmarkTools.Trial: 1608 samples with 1 evaluation.
 Range (min … max):  1.867 ms …  2.238 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.891 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.894 ms ± 19.374 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▁▂▅▅▄█▄▅▇▅▆▁
  ▃▃▃▄▆▇██████████████▇▆▇▆▅▅▃▄▄▃▃▃▃▃▂▃▂▂▃▂▂▃▃▂▂▂▂▂▂▁▁▂▁▁▂▂▂▂ ▄
  1.87 ms        Histogram: frequency by time        1.96 ms <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=1) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.702 μs … 318.250 μs  ┊ GC (min … max): 0.00% … 97.36%
 Time  (median):     5.743 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.811 μs ±   3.129 μs  ┊ GC (mean ± σ):  0.53% ±  0.97%

  ▁▆█▆▇▆▄▁▁ ▁ ▁▂▂ ▂▃▃▁▂▁▁                                     ▂
  ███████████▇███████████▇▇▆▆▄▄▄▅▁▃▅▅▃▅▄▅▆█▇▇▃▄▅▄▁▃▃▅▄▄▄▅▄▅▁▅ █
  5.7 μs       Histogram: log(frequency) by time      6.42 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=2) setup=a=rand(100,100)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  12.833 μs …  26.625 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     13.958 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.995 μs ± 290.298 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                ▂▆ █▇▄
  ▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▂▂▁▂▁▂▂▂▂▂▃▁▄▇██▁███▄▇▆▅▄▁▃▃▃▃▁▃▃▃▁▃▂▂▂▁▂▂▂▂ ▃
  12.8 μs         Histogram: frequency by time         14.8 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=1) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 934 evaluations.
 Range (min … max):  107.468 ns …  52.664 μs  ┊ GC (min … max): 0.00% … 99.73%
 Time  (median):     113.134 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   118.891 ns ± 525.570 ns  ┊ GC (mean ± σ):  4.59% ±  2.22%

   ▄█▆▂▂▂▂▂▃▃▂▃▄▄▄▄▄▄▄▄▄▄▃▃▂▂▂▂▂▁▁▁▁▁                           ▂
  ███████████████████████████████████████▇▇▇▆▇▇▅▅▅▆▆▅▅▄▅▅▅▅▄▄▄▅ █
  107 ns        Histogram: log(frequency) by time        134 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

julia> @benchmark nanstd(a, dims=2) setup=a=rand(10,10)
BenchmarkTools.Trial: 10000 samples with 903 evaluations.
 Range (min … max):  123.846 ns …  58.171 μs  ┊ GC (min … max): 0.00% … 99.73%
 Time  (median):     130.907 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   138.251 ns ± 580.488 ns  ┊ GC (mean ± σ):  4.37% ±  2.23%

   ▆█
  ▁██▅▃▃▃▄▄▆▅▆▆▅▆▅▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  124 ns           Histogram: frequency by time          164 ns <

 Memory estimate: 144 bytes, allocs estimate: 1.

brenhinkeller · 2024-06-13T17:11:30Z

While in other cases I've seen much larger drops in performance when removing @turbo, these don't seem too bad. Mostly we see that we lose the benefit of LV's loop reordering, which previously limited the difference between column- vs row- based reductions (e.g. dims=1 vs dims=2).

Closes #47

brenhinkeller added 7 commits June 12, 2024 18:28

De-lv ArrayStats.jl

d0c962c

Improve dimensionality-checking in ArrayStats functions

ea59d26

De-lv other summary statistics

8ac5d8d

De-lv quicksort

359b367

Relax type constraints in nanvar

a815f87

Remove LoopVectorization dependency

29ed673

Don't assume that partialsort! partitions around the selected index

b1c2ca7

brenhinkeller force-pushed the de-lv branch from 1d61ddf to b1c2ca7 Compare June 13, 2024 04:39

Bump version to 0.6.36

0af934c

brenhinkeller force-pushed the de-lv branch from 28ebabc to 0af934c Compare June 13, 2024 17:07

brenhinkeller merged commit d9e60e1 into main Jun 13, 2024
10 checks passed

brenhinkeller mentioned this pull request Jun 13, 2024

Replacing LoopVectorization #47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove LoopVectorization dependency, in light of deprecation #48

Remove LoopVectorization dependency, in light of deprecation #48

brenhinkeller commented Jun 13, 2024 •

edited

Loading

brenhinkeller commented Jun 13, 2024 •

edited

Loading

brenhinkeller commented Jun 13, 2024

brenhinkeller commented Jun 13, 2024

brenhinkeller commented Jun 13, 2024

Remove LoopVectorization dependency, in light of deprecation #48

Remove LoopVectorization dependency, in light of deprecation #48

Conversation

brenhinkeller commented Jun 13, 2024 • edited Loading

brenhinkeller commented Jun 13, 2024 • edited Loading

brenhinkeller commented Jun 13, 2024

brenhinkeller commented Jun 13, 2024

brenhinkeller commented Jun 13, 2024

brenhinkeller commented Jun 13, 2024 •

edited

Loading

brenhinkeller commented Jun 13, 2024 •

edited

Loading