Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Distinct SimpleAggregate #4934

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

benjaminwinger
Copy link
Collaborator

@benjaminwinger benjaminwinger commented Feb 19, 2025

Extends the same method used to parallelize the hash aggregate and distinct hash aggregate to the simple, non-grouped aggregates.

There is still a reasonable amount of work being done single-threaded since after doing the parallel build + finalize of the distinct hash table we have to then scan it to actually compute the function result and that's currently being done in the single-threaded finalize since it can't be started until the parallel finalization is complete.
It would be possible to add another operator which does that in parallel, which I think could yield up to another 2x improvement to the runtime (skipping this step dropped both of the below benchmarks to ~0.9s, so this single-threaded part is clearly a significant part of the total runtime). But I think that's best left for a different PR.

(Edit: this can actually be done in the current operator fairly easily, we just need to store a state for each partition and combine them in finalize instead).

Query master 1 thread 128 threads
MATCH (h:hits) RETURN COUNT(DISTINCT h.UserID); 20.5s 18.6s 0.77s
MATCH (h:hits) RETURN COUNT(DISTINCT h.SearchPhrase); 16.6s 15.9s 0.85s

Probably needs a bit more cleanup so I'm opening as a draft for now. (Should be good now; I updated the benchmarks twice between what was mentioned above as well as a small optimization to make it use fixed-sized hash tables prior to partitioning like in the hash aggregate).

Copy link

codecov bot commented Feb 19, 2025

Codecov Report

Attention: Patch coverage is 95.06173% with 8 lines in your changes missing coverage. Please review.

Project coverage is 86.58%. Comparing base (7d2e7db) to head (8503584).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
.../processor/operator/aggregate/simple_aggregate.cpp 95.00% 2 Missing and 2 partials ⚠️
...cessor/operator/aggregate/aggregate_hash_table.cpp 92.85% 2 Missing ⚠️
...rocessor/operator/aggregate/aggregate_hash_table.h 75.00% 1 Missing ⚠️
...de/processor/operator/aggregate/simple_aggregate.h 92.85% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4934      +/-   ##
==========================================
+ Coverage   86.55%   86.58%   +0.02%     
==========================================
  Files        1409     1413       +4     
  Lines       60915    61192     +277     
  Branches     7492     7523      +31     
==========================================
+ Hits        52727    52984     +257     
- Misses       8019     8036      +17     
- Partials      169      172       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Benchmark Result

Master commit hash: eb9df669212996b903521ea2fd0f96a74ff86274
Branch commit hash: 583a7356aac8d2e1d7bbc461efe708ba2117bea8

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 725.63 724.27 1.37 (0.19%)
aggregation q28 6444.16 6380.35 63.81 (1.00%)
filter q14 126.19 126.35 -0.16 (-0.12%)
filter q15 131.53 128.65 2.88 (2.24%)
filter q16 299.88 302.97 -3.09 (-1.02%)
filter q17 448.26 445.48 2.78 (0.62%)
filter q18 1909.07 1915.15 -6.08 (-0.32%)
filter zonemap-node 89.65 88.85 0.80 (0.90%)
filter zonemap-node-lhs-cast 89.15 88.97 0.18 (0.20%)
filter zonemap-node-null 89.18 88.83 0.35 (0.40%)
filter zonemap-rel 5635.47 5444.80 190.68 (3.50%)
fixed_size_expr_evaluator q07 572.12 572.70 -0.58 (-0.10%)
fixed_size_expr_evaluator q08 800.91 801.78 -0.87 (-0.11%)
fixed_size_expr_evaluator q09 802.76 801.17 1.58 (0.20%)
fixed_size_expr_evaluator q10 238.02 236.47 1.55 (0.66%)
fixed_size_expr_evaluator q11 229.92 229.38 0.54 (0.24%)
fixed_size_expr_evaluator q12 227.78 227.03 0.76 (0.33%)
fixed_size_expr_evaluator q13 1458.79 1458.71 0.08 (0.01%)
fixed_size_seq_scan q23 112.06 109.07 2.99 (2.74%)
join q29 727.52 712.91 14.62 (2.05%)
join q30 9674.12 10983.88 -1309.76 (-11.92%)
join q31 6.76 5.64 1.12 (19.87%)
join SelectiveTwoHopJoin 53.52 54.98 -1.46 (-2.65%)
ldbc_snb_ic q35 2752.06 2574.37 177.69 (6.90%)
ldbc_snb_ic q36 462.85 458.32 4.53 (0.99%)
ldbc_snb_is q32 6.89 6.51 0.38 (5.83%)
ldbc_snb_is q33 12.20 16.82 -4.61 (-27.43%)
ldbc_snb_is q34 1.22 1.34 -0.11 (-8.54%)
multi-rel multi-rel-large-scan 1368.79 1335.25 33.53 (2.51%)
multi-rel multi-rel-lookup 21.33 33.21 -11.88 (-35.78%)
multi-rel multi-rel-small-scan 76.54 87.45 -10.91 (-12.47%)
order_by q25 136.67 128.55 8.12 (6.31%)
order_by q26 456.39 448.15 8.24 (1.84%)
order_by q27 1407.21 1406.01 1.21 (0.09%)
recursive_join recursive-join-bidirection 307.39 276.37 31.02 (11.22%)
recursive_join recursive-join-dense 7012.21 7341.78 -329.57 (-4.49%)
recursive_join recursive-join-path 23341.17 24172.86 -831.69 (-3.44%)
recursive_join recursive-join-sparse 1069.89 1049.70 20.19 (1.92%)
recursive_join recursive-join-trail 7040.90 7355.05 -314.15 (-4.27%)
scan_after_filter q01 171.34 171.95 -0.60 (-0.35%)
scan_after_filter q02 157.15 156.81 0.35 (0.22%)
shortest_path_ldbc100 q37 89.87 90.76 -0.89 (-0.98%)
shortest_path_ldbc100 q38 381.42 385.31 -3.88 (-1.01%)
shortest_path_ldbc100 q39 61.51 65.23 -3.72 (-5.71%)
shortest_path_ldbc100 q40 462.58 441.95 20.63 (4.67%)
var_size_expr_evaluator q03 2078.67 2113.71 -35.04 (-1.66%)
var_size_expr_evaluator q04 2209.81 2287.17 -77.36 (-3.38%)
var_size_expr_evaluator q05 2626.55 2645.55 -19.00 (-0.72%)
var_size_expr_evaluator q06 1321.73 1319.93 1.80 (0.14%)
var_size_seq_scan q19 1467.43 1480.49 -13.05 (-0.88%)
var_size_seq_scan q20 2537.31 2367.80 169.50 (7.16%)
var_size_seq_scan q21 2366.08 2325.37 40.71 (1.75%)
var_size_seq_scan q22 127.06 125.02 2.05 (1.64%)

Copy link

Benchmark Result

Master commit hash: c031db918dd8170fd8988b128ac43f11a2a12210
Branch commit hash: 890ef111a68d0fafb1e69c32dc9cce467fff5b16

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 738.30 724.23 14.07 (1.94%)
aggregation q28 6442.41 6374.83 67.58 (1.06%)
filter q14 141.99 128.41 13.58 (10.58%)
filter q15 143.73 124.47 19.26 (15.47%)
filter q16 320.87 306.31 14.56 (4.75%)
filter q17 461.68 446.91 14.77 (3.31%)
filter q18 1959.57 1888.35 71.21 (3.77%)
filter zonemap-node 105.35 91.94 13.41 (14.58%)
filter zonemap-node-lhs-cast 105.19 88.93 16.26 (18.29%)
filter zonemap-node-null 104.96 88.75 16.20 (18.26%)
filter zonemap-rel 5596.29 5561.73 34.56 (0.62%)
fixed_size_expr_evaluator q07 587.11 578.45 8.66 (1.50%)
fixed_size_expr_evaluator q08 816.46 808.77 7.69 (0.95%)
fixed_size_expr_evaluator q09 820.71 812.37 8.34 (1.03%)
fixed_size_expr_evaluator q10 252.86 244.38 8.48 (3.47%)
fixed_size_expr_evaluator q11 246.23 236.56 9.67 (4.09%)
fixed_size_expr_evaluator q12 243.94 233.78 10.16 (4.35%)
fixed_size_expr_evaluator q13 1480.12 1462.56 17.56 (1.20%)
fixed_size_seq_scan q23 126.58 117.00 9.58 (8.18%)
join q29 756.42 728.12 28.30 (3.89%)
join q30 10487.79 10483.93 3.86 (0.04%)
join q31 6.66 7.67 -1.01 (-13.15%)
join SelectiveTwoHopJoin 53.62 53.45 0.17 (0.32%)
ldbc_snb_ic q35 2668.76 2667.03 1.73 (0.06%)
ldbc_snb_ic q36 515.58 481.77 33.81 (7.02%)
ldbc_snb_is q32 5.77 4.09 1.68 (40.99%)
ldbc_snb_is q33 14.40 15.59 -1.19 (-7.66%)
ldbc_snb_is q34 1.30 1.22 0.08 (6.82%)
multi-rel multi-rel-large-scan 1320.95 1362.00 -41.05 (-3.01%)
multi-rel multi-rel-lookup 29.44 33.72 -4.28 (-12.68%)
multi-rel multi-rel-small-scan 76.22 91.50 -15.28 (-16.70%)
order_by q25 147.92 139.87 8.06 (5.76%)
order_by q26 468.19 484.67 -16.48 (-3.40%)
order_by q27 1434.78 1512.73 -77.95 (-5.15%)
recursive_join recursive-join-bidirection 330.23 304.33 25.90 (8.51%)
recursive_join recursive-join-dense 6046.04 7066.47 -1020.43 (-14.44%)
recursive_join recursive-join-path 23808.79 23609.03 199.75 (0.85%)
recursive_join recursive-join-sparse 1053.28 1055.72 -2.44 (-0.23%)
recursive_join recursive-join-trail 6320.70 7035.37 -714.67 (-10.16%)
scan_after_filter q01 190.35 174.30 16.05 (9.21%)
scan_after_filter q02 172.79 159.07 13.72 (8.62%)
shortest_path_ldbc100 q37 99.43 89.95 9.47 (10.53%)
shortest_path_ldbc100 q38 388.52 416.51 -27.99 (-6.72%)
shortest_path_ldbc100 q39 63.50 68.27 -4.77 (-6.99%)
shortest_path_ldbc100 q40 423.58 449.78 -26.20 (-5.83%)
var_size_expr_evaluator q03 2103.50 2118.48 -14.98 (-0.71%)
var_size_expr_evaluator q04 2274.58 2221.02 53.56 (2.41%)
var_size_expr_evaluator q05 2639.15 2672.80 -33.64 (-1.26%)
var_size_expr_evaluator q06 1345.36 1347.49 -2.12 (-0.16%)
var_size_seq_scan q19 1493.24 1490.66 2.58 (0.17%)
var_size_seq_scan q20 2506.95 2416.97 89.99 (3.72%)
var_size_seq_scan q21 2291.15 2311.01 -19.86 (-0.86%)
var_size_seq_scan q22 130.74 130.24 0.50 (0.38%)

@benjaminwinger benjaminwinger force-pushed the parallel-simple-distinct branch 2 times, most recently from 83356c2 to b1e7640 Compare February 20, 2025 18:30
@benjaminwinger benjaminwinger marked this pull request as ready for review February 20, 2025 18:32
@benjaminwinger benjaminwinger force-pushed the parallel-simple-distinct branch from b1e7640 to 8503584 Compare February 20, 2025 18:40
Copy link

Benchmark Result

Master commit hash: 0ddf62817943781a352edf04f572c38e05e90e23
Branch commit hash: 72895189f1133cff56772d01c53af1c92f46cc6c

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 739.02 717.48 21.54 (3.00%)
aggregation q28 6412.13 6497.16 -85.04 (-1.31%)
filter q14 143.54 118.88 24.66 (20.75%)
filter q15 142.96 115.16 27.79 (24.13%)
filter q16 317.09 322.30 -5.21 (-1.62%)
filter q17 461.21 438.46 22.75 (5.19%)
filter q18 1975.43 2234.03 -258.60 (-11.58%)
filter zonemap-node 105.39 81.73 23.65 (28.94%)
filter zonemap-node-lhs-cast 105.33 82.37 22.97 (27.88%)
filter zonemap-node-null 105.00 82.45 22.54 (27.34%)
filter zonemap-rel 6728.21 6255.63 472.58 (7.55%)
fixed_size_expr_evaluator q07 595.87 565.96 29.92 (5.29%)
fixed_size_expr_evaluator q08 827.37 793.14 34.23 (4.32%)
fixed_size_expr_evaluator q09 826.84 794.01 32.83 (4.14%)
fixed_size_expr_evaluator q10 260.20 230.56 29.64 (12.85%)
fixed_size_expr_evaluator q11 253.83 221.78 32.05 (14.45%)
fixed_size_expr_evaluator q12 250.31 218.87 31.44 (14.37%)
fixed_size_expr_evaluator q13 1467.08 1454.85 12.23 (0.84%)
fixed_size_seq_scan q23 137.35 105.29 32.07 (30.46%)
join q29 699.77 721.08 -21.31 (-2.96%)
join q30 10322.08 10453.73 -131.66 (-1.26%)
join q31 6.51 7.14 -0.64 (-8.92%)
join SelectiveTwoHopJoin 54.77 55.51 -0.74 (-1.33%)
ldbc_snb_ic q35 2695.58 2663.83 31.75 (1.19%)
ldbc_snb_ic q36 454.29 445.79 8.51 (1.91%)
ldbc_snb_is q32 6.52 5.85 0.67 (11.50%)
ldbc_snb_is q33 14.18 14.36 -0.17 (-1.22%)
ldbc_snb_is q34 1.32 1.25 0.07 (5.42%)
multi-rel multi-rel-large-scan 1744.17 1373.45 370.72 (26.99%)
multi-rel multi-rel-lookup 21.62 31.97 -10.35 (-32.37%)
multi-rel multi-rel-small-scan 93.67 82.97 10.70 (12.89%)
order_by q25 147.83 123.00 24.83 (20.19%)
order_by q26 469.45 447.04 22.41 (5.01%)
order_by q27 1464.08 1391.63 72.45 (5.21%)
recursive_join recursive-join-bidirection 302.77 317.45 -14.68 (-4.63%)
recursive_join recursive-join-dense 7121.41 7069.19 52.23 (0.74%)
recursive_join recursive-join-path 23651.99 23741.93 -89.95 (-0.38%)
recursive_join recursive-join-sparse 1058.92 1061.26 -2.34 (-0.22%)
recursive_join recursive-join-trail 7049.82 7030.64 19.18 (0.27%)
scan_after_filter q01 188.30 167.72 20.58 (12.27%)
scan_after_filter q02 173.27 151.79 21.48 (14.15%)
shortest_path_ldbc100 q37 90.29 95.46 -5.17 (-5.42%)
shortest_path_ldbc100 q38 387.89 458.18 -70.28 (-15.34%)
shortest_path_ldbc100 q39 70.75 63.63 7.12 (11.19%)
shortest_path_ldbc100 q40 458.52 434.12 24.41 (5.62%)
var_size_expr_evaluator q03 2096.21 2080.40 15.82 (0.76%)
var_size_expr_evaluator q04 2265.96 2241.81 24.15 (1.08%)
var_size_expr_evaluator q05 2656.44 2635.92 20.52 (0.78%)
var_size_expr_evaluator q06 1361.48 1322.69 38.79 (2.93%)
var_size_seq_scan q19 1483.68 1460.97 22.71 (1.55%)
var_size_seq_scan q20 2489.07 3028.56 -539.49 (-17.81%)
var_size_seq_scan q21 2303.94 2428.87 -124.92 (-5.14%)
var_size_seq_scan q22 134.33 125.92 8.41 (6.68%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant