Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read benchmarks #81

Merged
merged 1 commit into from
Dec 24, 2024
Merged

Read benchmarks #81

merged 1 commit into from
Dec 24, 2024

Conversation

Mike-Dax
Copy link
Contributor

@Mike-Dax Mike-Dax commented Dec 24, 2024

This PR adds some benchmarks to compare the following queries:

  • selecting a range of rows
  • performing a series of aggregations across a range of rows
  • performing a series of rolling window aggregations across a range of rows

and the following execution strategies:

Measuring 'Time to First Row'

  • runAndReadUntil(q, 1)
  • start runTask pending.readUntil(1)
  • start runTask single fetchChunk
  • streamAndReadUntil(q, 1)
  • startStream runTask pending.readUntil(1)
  • startStream runTask single fetchChunk
  • run single fetchChunk
  • stream single fetchChunk

Measuring 'Time to Full Result'

  • runAndReadAll()
  • start runTask pending.readAll()
  • start runTask fetchChunks loop
  • streamAndReadAll()
  • startStream runTask pending.readAll()
  • startStream runTask fetchChunks loop
  • run fetchChunks loop
  • stream fetchChunks loop

The statistically significant takeaways by my reading are:

  • Streaming gives you faster 'time to first result', as long as the query can be materialised lazily.

    • Overall aggregates don't benefit from streaming.
    • Rolling aggregates benefit greatly from streaming.
    • Simple row fetching benefits greatly from streaming.
  • Streaming can give you faster 'time to full result' in certain circumstances.

    • Overall aggregates again don't benefit from streaming.
    • Rolling aggregates don't benefit from streaming.
    • Simple row fetching benefits greatly from streaming.

Here are the full results on my M1 Macbook Pro:

 ✓ test/bench/read.bench.ts (48) 230150ms
   ✓ Row Fetching - Time to First Row (8) 5518ms
     name                                                           hz     min      max    mean     p75     p99    p995     p999      rme  samples
   · Row Fetching - runAndReadUntil(q, 1)                     1,376.76  0.4753  16.3925  0.7263  0.7714  0.9663  1.0138  16.3925   ±6.24%      689
   · Row Fetching - start runTask pending.readUntil(1)        1,079.19  0.4728  11.6620  0.9266  0.7817  7.6780  9.6702  11.6620  ±11.28%      542   slowest
   · Row Fetching - start runTask single fetchChunk           1,105.70  0.5103   9.1201  0.9044  0.7776  7.3376  8.0964   9.1201   ±9.74%      553
   · Row Fetching - streamAndReadUntil(q, 1)                  2,107.37  0.2790  43.5780  0.4745  0.4726  0.6541  0.7103   1.3092  ±16.94%     1054
   · Row Fetching - startStream runTask pending.readUntil(1)  2,323.58  0.2688   9.7647  0.4304  0.4301  0.5947  5.6202   7.1385   ±6.67%     1164
   · Row Fetching - startStream runTask single fetchChunk     2,363.71  0.2729   9.1197  0.4231  0.4259  0.5968  4.8921   8.6742   ±6.72%     1182   fastest
   · Row Fetching - run single fetchChunk                     1,331.20  0.4783  59.6678  0.7512  0.7247  0.8967  0.9563  59.6678  ±23.13%      666
   · Row Fetching - stream single fetchChunk                  2,176.48  0.2682  94.8838  0.4595  0.4277  0.5614  0.5826   0.6777  ±31.38%     1285
   ✓ Overall Aggregates - Time to First Row (8) 5440ms
     name                                                               hz     min      max    mean     p75     p99    p995     p999     rme  samples
   · Overall Aggregates - runAndReadUntil(q, 1)                     877.93  0.7878   1.7058  1.1390  1.2682  1.4572  1.5715   1.7058  ±1.38%      440   fastest
   · Overall Aggregates - start runTask pending.readUntil(1)        828.47  0.8028   3.3645  1.2070  1.3137  2.7739  2.9337   3.3645  ±2.58%      415   slowest
   · Overall Aggregates - start runTask single fetchChunk           831.70  0.7918   3.6258  1.2024  1.3221  2.3136  2.6003   3.6258  ±2.22%      416
   · Overall Aggregates - streamAndReadUntil(q, 1)                  860.32  0.8093  10.5932  1.1624  1.2869  1.4233  1.5662  10.5932  ±3.93%      431
   · Overall Aggregates - startStream runTask pending.readUntil(1)  851.64  0.7908   1.6785  1.1742  1.3078  1.5665  1.6066   1.6785  ±1.40%      426
   · Overall Aggregates - startStream runTask single fetchChunk     863.23  0.7758   1.4962  1.1584  1.2987  1.4348  1.4740   1.4962  ±1.37%      432
   · Overall Aggregates - run single fetchChunk                     863.06  0.8056   8.9131  1.1587  1.2681  1.4543  1.5107   8.9131  ±3.32%      432
   · Overall Aggregates - stream single fetchChunk                  859.57  0.7852   5.0499  1.1634  1.2870  1.5952  2.1741   5.0499  ±2.14%      430
   ✓ Rolling Aggregates - Time to First Row (8) 85961ms
     name                                                                hz      min      max     mean      p75      p99     p995     p999     rme  samples
   · Rolling Aggregates - runAndReadUntil(q, 1)                     13.7236  44.5004   123.83  72.8670  78.2095  94.0872   100.97   123.83  ±2.02%      200
   · Rolling Aggregates - start runTask pending.readUntil(1)        13.8090  44.1295   121.97  72.4166  78.4516  97.5966   100.35   121.97  ±2.20%      200
   · Rolling Aggregates - start runTask single fetchChunk           13.4964  45.5832   156.89  74.0938  79.1863  96.2242  96.3086   156.89  ±2.18%      200   slowest
   · Rolling Aggregates - streamAndReadUntil(q, 1)                  32.4901  23.7333  42.0643  30.7786  32.1036  38.5903  39.3412  42.0643  ±1.70%      200   fastest
   · Rolling Aggregates - startStream runTask pending.readUntil(1)  31.6999  23.7718  44.6693  31.5458  33.0128  40.5613  41.1911  44.6693  ±1.70%      200
   · Rolling Aggregates - startStream runTask single fetchChunk     31.6347  23.8681  52.4038  31.6108  33.0380  47.6970  47.8277  52.4038  ±1.88%      200
   · Rolling Aggregates - run single fetchChunk                     13.9924  44.0817   121.27  71.4675  77.8605  79.8555   107.91   121.27  ±2.19%      200
   · Rolling Aggregates - stream single fetchChunk                  32.2393  23.5144  70.5517  31.0181  32.3448  39.6358  39.7199  70.5517  ±2.13%      200
   ✓ Row Fetching - Full Result (8) 5416ms
     name                                                        hz     min      max    mean     p75     p99     p995     p999     rme  samples
   · Row Fetching - runAndReadAll()                          718.63  0.8328   6.7487  1.3915  1.2545  6.3228   6.7328   6.7487  ±7.17%      360
   · Row Fetching - start runTask pending.readAll()          577.95  0.9156   7.1313  1.7302  1.6273  5.3958   5.8823   7.1313  ±6.66%      289   slowest
   · Row Fetching - start runTask fetchChunks loop           760.48  0.8213   5.4165  1.3150  1.3235  3.4509   3.9240   5.4165  ±4.16%      381
   · Row Fetching - streamAndReadAll()                       998.97  0.5869  17.6965  1.0010  0.9653  2.0408   2.4146  17.6965  ±7.01%      500
   · Row Fetching - startStream runTask pending.readAll()    991.83  0.5981   3.4852  1.0082  1.0139  2.6441   2.6847   3.4852  ±2.92%      498
   · Row Fetching - startStream runTask fetchChunks loop   1,000.72  0.6345   3.4698  0.9993  0.9927  2.7818   2.9266   3.4698  ±3.13%      501   fastest
   · Row Fetching - run fetchChunks loop                     701.55  0.8667  10.6289  1.4254  1.2878  9.3998  10.2593  10.6289  ±9.51%      351
   · Row Fetching - stream fetchChunks loop                  933.68  0.6153  17.5829  1.0710  1.0151  4.1196   4.2596  17.5829  ±7.85%      467
   ✓ Overall Aggregates - Full Result (8) 5432ms
     name                                                            hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · Overall Aggregates - runAndReadAll()                        844.43  0.7865  5.8209  1.1842  1.3093  1.5038  1.5552  5.8209  ±2.27%      423
   · Overall Aggregates - start runTask pending.readAll()        845.97  0.7873  1.6718  1.1821  1.3326  1.5857  1.5935  1.6718  ±1.48%      423
   · Overall Aggregates - start runTask fetchChunks loop         830.36  0.8041  1.6605  1.2043  1.3396  1.5843  1.6173  1.6605  ±1.36%      416
   · Overall Aggregates - streamAndReadAll()                     856.70  0.7828  5.5033  1.1673  1.2991  1.4692  1.7650  5.5033  ±2.20%      429   fastest
   · Overall Aggregates - startStream runTask pending.readAll()  842.75  0.8340  1.7361  1.1866  1.3355  1.5678  1.5780  1.7361  ±1.46%      422
   · Overall Aggregates - startStream runTask fetchChunks loop   818.82  0.8265  1.7138  1.2213  1.3707  1.6475  1.6993  1.7138  ±1.58%      410   slowest
   · Overall Aggregates - run fetchChunks loop                   829.73  0.8407  7.8850  1.2052  1.3224  1.5317  1.6899  7.8850  ±2.95%      415
   · Overall Aggregates - stream fetchChunks loop                851.84  0.7923  7.5480  1.1739  1.2875  1.5922  1.8078  7.5480  ±3.36%      426
   ✓ Rolling Aggregates - Full Result (8) 122381ms
     name                                                             hz      min      max     mean      p75      p99     p995     p999     rme  samples
   · Rolling Aggregates - runAndReadAll()                        13.7514  45.1392   120.32  72.7201  78.8937  87.8110   111.02   120.32  ±2.12%      200
   · Rolling Aggregates - start runTask pending.readAll()        13.6204  46.5226   152.29  73.4192  78.9196  82.9205   105.00   152.29  ±2.14%      200
   · Rolling Aggregates - start runTask fetchChunks loop         13.7576  45.2252  95.2338  72.6872  78.6150  81.7152  82.0683  95.2338  ±1.85%      200   fastest
   · Rolling Aggregates - streamAndReadAll()                     13.2885  63.2723   105.59  75.2528  78.8987  94.2671  99.6013   105.59  ±1.20%      200
   · Rolling Aggregates - startStream runTask pending.readAll()  13.3006  63.4977   102.19  75.1848  79.3148  94.0547   100.32   102.19  ±1.21%      200
   · Rolling Aggregates - startStream runTask fetchChunks loop   13.3335  63.4264   121.22  74.9989  78.6301  82.0112   106.03   121.22  ±1.25%      200
   · Rolling Aggregates - run fetchChunks loop                   13.4990  44.8317   157.66  74.0796  78.4570  99.0749   103.98   157.66  ±2.04%      200
   · Rolling Aggregates - stream fetchChunks loop                13.2082  63.0522   167.50  75.7107  78.7524  91.6548   117.71   167.50  ±1.69%      200   slowest

 BENCH  Summary

  Row Fetching - startStream runTask single fetchChunk - test/bench/read.bench.ts > Row Fetching - Time to First Row
    1.02x faster than Row Fetching - startStream runTask pending.readUntil(1)
    1.09x faster than Row Fetching - stream single fetchChunk
    1.12x faster than Row Fetching - streamAndReadUntil(q, 1)
    1.72x faster than Row Fetching - runAndReadUntil(q, 1)
    1.78x faster than Row Fetching - run single fetchChunk
    2.14x faster than Row Fetching - start runTask single fetchChunk
    2.19x faster than Row Fetching - start runTask pending.readUntil(1)

  Overall Aggregates - runAndReadUntil(q, 1) - test/bench/read.bench.ts > Overall Aggregates - Time to First Row
    1.02x faster than Overall Aggregates - startStream runTask single fetchChunk
    1.02x faster than Overall Aggregates - run single fetchChunk
    1.02x faster than Overall Aggregates - streamAndReadUntil(q, 1)
    1.02x faster than Overall Aggregates - stream single fetchChunk
    1.03x faster than Overall Aggregates - startStream runTask pending.readUntil(1)
    1.06x faster than Overall Aggregates - start runTask single fetchChunk
    1.06x faster than Overall Aggregates - start runTask pending.readUntil(1)

  Rolling Aggregates - streamAndReadUntil(q, 1) - test/bench/read.bench.ts > Rolling Aggregates - Time to First Row
    1.01x faster than Rolling Aggregates - stream single fetchChunk
    1.02x faster than Rolling Aggregates - startStream runTask pending.readUntil(1)
    1.03x faster than Rolling Aggregates - startStream runTask single fetchChunk
    2.32x faster than Rolling Aggregates - run single fetchChunk
    2.35x faster than Rolling Aggregates - start runTask pending.readUntil(1)
    2.37x faster than Rolling Aggregates - runAndReadUntil(q, 1)
    2.41x faster than Rolling Aggregates - start runTask single fetchChunk

  Row Fetching - startStream runTask fetchChunks loop - test/bench/read.bench.ts > Row Fetching - Full Result
    1.00x faster than Row Fetching - streamAndReadAll()
    1.01x faster than Row Fetching - startStream runTask pending.readAll()
    1.07x faster than Row Fetching - stream fetchChunks loop
    1.32x faster than Row Fetching - start runTask fetchChunks loop
    1.39x faster than Row Fetching - runAndReadAll()
    1.43x faster than Row Fetching - run fetchChunks loop
    1.73x faster than Row Fetching - start runTask pending.readAll()

  Overall Aggregates - streamAndReadAll() - test/bench/read.bench.ts > Overall Aggregates - Full Result
    1.01x faster than Overall Aggregates - stream fetchChunks loop
    1.01x faster than Overall Aggregates - start runTask pending.readAll()
    1.01x faster than Overall Aggregates - runAndReadAll()
    1.02x faster than Overall Aggregates - startStream runTask pending.readAll()
    1.03x faster than Overall Aggregates - start runTask fetchChunks loop
    1.03x faster than Overall Aggregates - run fetchChunks loop
    1.05x faster than Overall Aggregates - startStream runTask fetchChunks loop

  Rolling Aggregates - start runTask fetchChunks loop - test/bench/read.bench.ts > Rolling Aggregates - Full Result
    1.00x faster than Rolling Aggregates - runAndReadAll()
    1.01x faster than Rolling Aggregates - start runTask pending.readAll()
    1.02x faster than Rolling Aggregates - run fetchChunks loop
    1.03x faster than Rolling Aggregates - startStream runTask fetchChunks loop
    1.03x faster than Rolling Aggregates - startStream runTask pending.readAll()
    1.04x faster than Rolling Aggregates - streamAndReadAll()
    1.04x faster than Rolling Aggregates - stream fetchChunks loop

This is how many tasks were run through before being 'ready' for each of the queries where applicable, as some additional 'numbers'. I don't think they tell much of a story, but interesting to know none-the-less.

// Full Result
Row Fetching - start runTask pending.readAll() - tasks 304
Row Fetching - start runTask fetchChunks loop - tasks 436
Row Fetching - startStream runTask pending.readAll() - tasks 25
Row Fetching - startStream runTask fetchChunks loop - tasks 114
Overall Aggregates - start runTask pending.readAll() - tasks 2018
Overall Aggregates - start runTask fetchChunks loop - tasks 1516
Overall Aggregates - startStream runTask pending.readAll() - tasks 6
Overall Aggregates - startStream runTask fetchChunks loop - tasks 3645
Rolling Aggregates - start runTask pending.readAll() - tasks 331597
Rolling Aggregates - start runTask fetchChunks loop - tasks 206513
Rolling Aggregates - startStream runTask pending.readAll() - tasks 109053
Rolling Aggregates - startStream runTask fetchChunks loop - tasks 121899

// Time to First Row
Row Fetching - start runTask pending.readUntil(1) - tasks 936
Row Fetching - start runTask single fetchChunk - tasks 211
Row Fetching - startStream runTask pending.readUntil(1) - tasks 363
Row Fetching - startStream runTask single fetchChunk - tasks 43
Overall Aggregates - start runTask pending.readUntil(1) - tasks 1490
Overall Aggregates - start runTask single fetchChunk - tasks 632
Overall Aggregates - startStream runTask pending.readUntil(1) - tasks 2639
Overall Aggregates - startStream runTask single fetchChunk - tasks 580
Rolling Aggregates - start runTask pending.readUntil(1) - tasks 337248
Rolling Aggregates - start runTask single fetchChunk - tasks 315596
Rolling Aggregates - startStream runTask pending.readUntil(1) - tasks 106010
Rolling Aggregates - startStream runTask single fetchChunk - tasks 8674

@jraymakers
Copy link
Contributor

Thanks! Interesting data. Nice confirmation that streaming can be very useful.

@jraymakers jraymakers merged commit 439ef03 into duckdb:main Dec 24, 2024
5 checks passed
@Mike-Dax Mike-Dax deleted the read-benchmarks branch December 25, 2024 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants