Parallel Distinct SimpleAggregate #4934

benjaminwinger · 2025-02-19T22:09:34Z

Extends the same method used to parallelize the hash aggregate and distinct hash aggregate to the simple, non-grouped aggregates.

There is still a reasonable amount of work being done single-threaded since after doing the parallel build + finalize of the distinct hash table we have to then scan it to actually compute the function result and that's currently being done in the single-threaded finalize since it can't be started until the parallel finalization is complete.
It would be possible to add another operator which does that in parallel, which I think could yield up to another 2x improvement to the runtime (skipping this step dropped both of the below benchmarks to ~0.9s, so this single-threaded part is clearly a significant part of the total runtime). But I think that's best left for a different PR.
(Edit: this can actually be done in the current operator fairly easily, we just need to store a state for each partition and combine them in finalize instead).

Query	master	1 thread	128 threads
`MATCH (h:hits) RETURN COUNT(DISTINCT h.UserID);`	20.5s	18.6s	0.77s
`MATCH (h:hits) RETURN COUNT(DISTINCT h.SearchPhrase);`	16.6s	15.9s	0.85s

~~Probably needs a bit more cleanup so I'm opening as a draft for now.~~ (Should be good now; I updated the benchmarks twice between what was mentioned above as well as a small optimization to make it use fixed-sized hash tables prior to partitioning like in the hash aggregate).

codecov · 2025-02-19T22:48:43Z

Codecov Report

Attention: Patch coverage is 95.06173% with 8 lines in your changes missing coverage. Please review.

Project coverage is 86.58%. Comparing base (7d2e7db) to head (8503584).
Report is 6 commits behind head on master.

Files with missing lines	Patch %	Lines
.../processor/operator/aggregate/simple_aggregate.cpp	95.00%	2 Missing and 2 partials ⚠️
...cessor/operator/aggregate/aggregate_hash_table.cpp	92.85%	2 Missing ⚠️
...rocessor/operator/aggregate/aggregate_hash_table.h	75.00%	1 Missing ⚠️
...de/processor/operator/aggregate/simple_aggregate.h	92.85%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4934      +/-   ##
==========================================
+ Coverage   86.55%   86.58%   +0.02%     
==========================================
  Files        1409     1413       +4     
  Lines       60915    61192     +277     
  Branches     7492     7523      +31     
==========================================
+ Hits        52727    52984     +257     
- Misses       8019     8036      +17     
- Partials      169      172       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2025-02-19T22:55:14Z

Benchmark Result

Master commit hash: eb9df669212996b903521ea2fd0f96a74ff86274
Branch commit hash: 583a7356aac8d2e1d7bbc461efe708ba2117bea8

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	725.63	724.27	1.37 (0.19%)
aggregation	q28	6444.16	6380.35	63.81 (1.00%)
filter	q14	126.19	126.35	-0.16 (-0.12%)
filter	q15	131.53	128.65	2.88 (2.24%)
filter	q16	299.88	302.97	-3.09 (-1.02%)
filter	q17	448.26	445.48	2.78 (0.62%)
filter	q18	1909.07	1915.15	-6.08 (-0.32%)
filter	zonemap-node	89.65	88.85	0.80 (0.90%)
filter	zonemap-node-lhs-cast	89.15	88.97	0.18 (0.20%)
filter	zonemap-node-null	89.18	88.83	0.35 (0.40%)
filter	zonemap-rel	5635.47	5444.80	190.68 (3.50%)
fixed_size_expr_evaluator	q07	572.12	572.70	-0.58 (-0.10%)
fixed_size_expr_evaluator	q08	800.91	801.78	-0.87 (-0.11%)
fixed_size_expr_evaluator	q09	802.76	801.17	1.58 (0.20%)
fixed_size_expr_evaluator	q10	238.02	236.47	1.55 (0.66%)
fixed_size_expr_evaluator	q11	229.92	229.38	0.54 (0.24%)
fixed_size_expr_evaluator	q12	227.78	227.03	0.76 (0.33%)
fixed_size_expr_evaluator	q13	1458.79	1458.71	0.08 (0.01%)
fixed_size_seq_scan	q23	112.06	109.07	2.99 (2.74%)
join	q29	727.52	712.91	14.62 (2.05%)
join	q30	9674.12	10983.88	-1309.76 (-11.92%)
join	q31	6.76	5.64	1.12 (19.87%)
join	SelectiveTwoHopJoin	53.52	54.98	-1.46 (-2.65%)
ldbc_snb_ic	q35	2752.06	2574.37	177.69 (6.90%)
ldbc_snb_ic	q36	462.85	458.32	4.53 (0.99%)
ldbc_snb_is	q32	6.89	6.51	0.38 (5.83%)
ldbc_snb_is	q33	12.20	16.82	-4.61 (-27.43%)
ldbc_snb_is	q34	1.22	1.34	-0.11 (-8.54%)
multi-rel	multi-rel-large-scan	1368.79	1335.25	33.53 (2.51%)
multi-rel	multi-rel-lookup	21.33	33.21	-11.88 (-35.78%)
multi-rel	multi-rel-small-scan	76.54	87.45	-10.91 (-12.47%)
order_by	q25	136.67	128.55	8.12 (6.31%)
order_by	q26	456.39	448.15	8.24 (1.84%)
order_by	q27	1407.21	1406.01	1.21 (0.09%)
recursive_join	recursive-join-bidirection	307.39	276.37	31.02 (11.22%)
recursive_join	recursive-join-dense	7012.21	7341.78	-329.57 (-4.49%)
recursive_join	recursive-join-path	23341.17	24172.86	-831.69 (-3.44%)
recursive_join	recursive-join-sparse	1069.89	1049.70	20.19 (1.92%)
recursive_join	recursive-join-trail	7040.90	7355.05	-314.15 (-4.27%)
scan_after_filter	q01	171.34	171.95	-0.60 (-0.35%)
scan_after_filter	q02	157.15	156.81	0.35 (0.22%)
shortest_path_ldbc100	q37	89.87	90.76	-0.89 (-0.98%)
shortest_path_ldbc100	q38	381.42	385.31	-3.88 (-1.01%)
shortest_path_ldbc100	q39	61.51	65.23	-3.72 (-5.71%)
shortest_path_ldbc100	q40	462.58	441.95	20.63 (4.67%)
var_size_expr_evaluator	q03	2078.67	2113.71	-35.04 (-1.66%)
var_size_expr_evaluator	q04	2209.81	2287.17	-77.36 (-3.38%)
var_size_expr_evaluator	q05	2626.55	2645.55	-19.00 (-0.72%)
var_size_expr_evaluator	q06	1321.73	1319.93	1.80 (0.14%)
var_size_seq_scan	q19	1467.43	1480.49	-13.05 (-0.88%)
var_size_seq_scan	q20	2537.31	2367.80	169.50 (7.16%)
var_size_seq_scan	q21	2366.08	2325.37	40.71 (1.75%)
var_size_seq_scan	q22	127.06	125.02	2.05 (1.64%)

github-actions · 2025-02-20T16:05:50Z

Benchmark Result

Master commit hash: c031db918dd8170fd8988b128ac43f11a2a12210
Branch commit hash: 890ef111a68d0fafb1e69c32dc9cce467fff5b16

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	738.30	724.23	14.07 (1.94%)
aggregation	q28	6442.41	6374.83	67.58 (1.06%)
filter	q14	141.99	128.41	13.58 (10.58%)
filter	q15	143.73	124.47	19.26 (15.47%)
filter	q16	320.87	306.31	14.56 (4.75%)
filter	q17	461.68	446.91	14.77 (3.31%)
filter	q18	1959.57	1888.35	71.21 (3.77%)
filter	zonemap-node	105.35	91.94	13.41 (14.58%)
filter	zonemap-node-lhs-cast	105.19	88.93	16.26 (18.29%)
filter	zonemap-node-null	104.96	88.75	16.20 (18.26%)
filter	zonemap-rel	5596.29	5561.73	34.56 (0.62%)
fixed_size_expr_evaluator	q07	587.11	578.45	8.66 (1.50%)
fixed_size_expr_evaluator	q08	816.46	808.77	7.69 (0.95%)
fixed_size_expr_evaluator	q09	820.71	812.37	8.34 (1.03%)
fixed_size_expr_evaluator	q10	252.86	244.38	8.48 (3.47%)
fixed_size_expr_evaluator	q11	246.23	236.56	9.67 (4.09%)
fixed_size_expr_evaluator	q12	243.94	233.78	10.16 (4.35%)
fixed_size_expr_evaluator	q13	1480.12	1462.56	17.56 (1.20%)
fixed_size_seq_scan	q23	126.58	117.00	9.58 (8.18%)
join	q29	756.42	728.12	28.30 (3.89%)
join	q30	10487.79	10483.93	3.86 (0.04%)
join	q31	6.66	7.67	-1.01 (-13.15%)
join	SelectiveTwoHopJoin	53.62	53.45	0.17 (0.32%)
ldbc_snb_ic	q35	2668.76	2667.03	1.73 (0.06%)
ldbc_snb_ic	q36	515.58	481.77	33.81 (7.02%)
ldbc_snb_is	q32	5.77	4.09	1.68 (40.99%)
ldbc_snb_is	q33	14.40	15.59	-1.19 (-7.66%)
ldbc_snb_is	q34	1.30	1.22	0.08 (6.82%)
multi-rel	multi-rel-large-scan	1320.95	1362.00	-41.05 (-3.01%)
multi-rel	multi-rel-lookup	29.44	33.72	-4.28 (-12.68%)
multi-rel	multi-rel-small-scan	76.22	91.50	-15.28 (-16.70%)
order_by	q25	147.92	139.87	8.06 (5.76%)
order_by	q26	468.19	484.67	-16.48 (-3.40%)
order_by	q27	1434.78	1512.73	-77.95 (-5.15%)
recursive_join	recursive-join-bidirection	330.23	304.33	25.90 (8.51%)
recursive_join	recursive-join-dense	6046.04	7066.47	-1020.43 (-14.44%)
recursive_join	recursive-join-path	23808.79	23609.03	199.75 (0.85%)
recursive_join	recursive-join-sparse	1053.28	1055.72	-2.44 (-0.23%)
recursive_join	recursive-join-trail	6320.70	7035.37	-714.67 (-10.16%)
scan_after_filter	q01	190.35	174.30	16.05 (9.21%)
scan_after_filter	q02	172.79	159.07	13.72 (8.62%)
shortest_path_ldbc100	q37	99.43	89.95	9.47 (10.53%)
shortest_path_ldbc100	q38	388.52	416.51	-27.99 (-6.72%)
shortest_path_ldbc100	q39	63.50	68.27	-4.77 (-6.99%)
shortest_path_ldbc100	q40	423.58	449.78	-26.20 (-5.83%)
var_size_expr_evaluator	q03	2103.50	2118.48	-14.98 (-0.71%)
var_size_expr_evaluator	q04	2274.58	2221.02	53.56 (2.41%)
var_size_expr_evaluator	q05	2639.15	2672.80	-33.64 (-1.26%)
var_size_expr_evaluator	q06	1345.36	1347.49	-2.12 (-0.16%)
var_size_seq_scan	q19	1493.24	1490.66	2.58 (0.17%)
var_size_seq_scan	q20	2506.95	2416.97	89.99 (3.72%)
var_size_seq_scan	q21	2291.15	2311.01	-19.86 (-0.86%)
var_size_seq_scan	q22	130.74	130.24	0.50 (0.38%)

github-actions · 2025-02-20T19:16:54Z

Benchmark Result

Master commit hash: 0ddf62817943781a352edf04f572c38e05e90e23
Branch commit hash: 72895189f1133cff56772d01c53af1c92f46cc6c

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	739.02	717.48	21.54 (3.00%)
aggregation	q28	6412.13	6497.16	-85.04 (-1.31%)
filter	q14	143.54	118.88	24.66 (20.75%)
filter	q15	142.96	115.16	27.79 (24.13%)
filter	q16	317.09	322.30	-5.21 (-1.62%)
filter	q17	461.21	438.46	22.75 (5.19%)
filter	q18	1975.43	2234.03	-258.60 (-11.58%)
filter	zonemap-node	105.39	81.73	23.65 (28.94%)
filter	zonemap-node-lhs-cast	105.33	82.37	22.97 (27.88%)
filter	zonemap-node-null	105.00	82.45	22.54 (27.34%)
filter	zonemap-rel	6728.21	6255.63	472.58 (7.55%)
fixed_size_expr_evaluator	q07	595.87	565.96	29.92 (5.29%)
fixed_size_expr_evaluator	q08	827.37	793.14	34.23 (4.32%)
fixed_size_expr_evaluator	q09	826.84	794.01	32.83 (4.14%)
fixed_size_expr_evaluator	q10	260.20	230.56	29.64 (12.85%)
fixed_size_expr_evaluator	q11	253.83	221.78	32.05 (14.45%)
fixed_size_expr_evaluator	q12	250.31	218.87	31.44 (14.37%)
fixed_size_expr_evaluator	q13	1467.08	1454.85	12.23 (0.84%)
fixed_size_seq_scan	q23	137.35	105.29	32.07 (30.46%)
join	q29	699.77	721.08	-21.31 (-2.96%)
join	q30	10322.08	10453.73	-131.66 (-1.26%)
join	q31	6.51	7.14	-0.64 (-8.92%)
join	SelectiveTwoHopJoin	54.77	55.51	-0.74 (-1.33%)
ldbc_snb_ic	q35	2695.58	2663.83	31.75 (1.19%)
ldbc_snb_ic	q36	454.29	445.79	8.51 (1.91%)
ldbc_snb_is	q32	6.52	5.85	0.67 (11.50%)
ldbc_snb_is	q33	14.18	14.36	-0.17 (-1.22%)
ldbc_snb_is	q34	1.32	1.25	0.07 (5.42%)
multi-rel	multi-rel-large-scan	1744.17	1373.45	370.72 (26.99%)
multi-rel	multi-rel-lookup	21.62	31.97	-10.35 (-32.37%)
multi-rel	multi-rel-small-scan	93.67	82.97	10.70 (12.89%)
order_by	q25	147.83	123.00	24.83 (20.19%)
order_by	q26	469.45	447.04	22.41 (5.01%)
order_by	q27	1464.08	1391.63	72.45 (5.21%)
recursive_join	recursive-join-bidirection	302.77	317.45	-14.68 (-4.63%)
recursive_join	recursive-join-dense	7121.41	7069.19	52.23 (0.74%)
recursive_join	recursive-join-path	23651.99	23741.93	-89.95 (-0.38%)
recursive_join	recursive-join-sparse	1058.92	1061.26	-2.34 (-0.22%)
recursive_join	recursive-join-trail	7049.82	7030.64	19.18 (0.27%)
scan_after_filter	q01	188.30	167.72	20.58 (12.27%)
scan_after_filter	q02	173.27	151.79	21.48 (14.15%)
shortest_path_ldbc100	q37	90.29	95.46	-5.17 (-5.42%)
shortest_path_ldbc100	q38	387.89	458.18	-70.28 (-15.34%)
shortest_path_ldbc100	q39	70.75	63.63	7.12 (11.19%)
shortest_path_ldbc100	q40	458.52	434.12	24.41 (5.62%)
var_size_expr_evaluator	q03	2096.21	2080.40	15.82 (0.76%)
var_size_expr_evaluator	q04	2265.96	2241.81	24.15 (1.08%)
var_size_expr_evaluator	q05	2656.44	2635.92	20.52 (0.78%)
var_size_expr_evaluator	q06	1361.48	1322.69	38.79 (2.93%)
var_size_seq_scan	q19	1483.68	1460.97	22.71 (1.55%)
var_size_seq_scan	q20	2489.07	3028.56	-539.49 (-17.81%)
var_size_seq_scan	q21	2303.94	2428.87	-124.92 (-5.14%)
var_size_seq_scan	q22	134.33	125.92	8.41 (6.68%)

benjaminwinger force-pushed the parallel-simple-distinct branch 2 times, most recently from 83356c2 to b1e7640 Compare February 20, 2025 18:30

benjaminwinger marked this pull request as ready for review February 20, 2025 18:32

Parallel DISTINCT for SimpleAggregate

8503584

benjaminwinger force-pushed the parallel-simple-distinct branch from b1e7640 to 8503584 Compare February 20, 2025 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Distinct SimpleAggregate #4934

Parallel Distinct SimpleAggregate #4934

benjaminwinger commented Feb 19, 2025 •

edited

Loading

codecov bot commented Feb 19, 2025 •

edited

Loading

github-actions bot commented Feb 19, 2025

github-actions bot commented Feb 20, 2025

github-actions bot commented Feb 20, 2025

Parallel Distinct SimpleAggregate #4934

Are you sure you want to change the base?

Parallel Distinct SimpleAggregate #4934

Conversation

benjaminwinger commented Feb 19, 2025 • edited Loading

codecov bot commented Feb 19, 2025 • edited Loading

Codecov Report

github-actions bot commented Feb 19, 2025

Benchmark Result

github-actions bot commented Feb 20, 2025

Benchmark Result

github-actions bot commented Feb 20, 2025

Benchmark Result

benjaminwinger commented Feb 19, 2025 •

edited

Loading

codecov bot commented Feb 19, 2025 •

edited

Loading