Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GCS support #4892

Merged
merged 4 commits into from
Feb 18, 2025
Merged

Add GCS support #4892

merged 4 commits into from
Feb 18, 2025

Conversation

royi-luo
Copy link
Collaborator

@royi-luo royi-luo commented Feb 12, 2025

Description

GCS's interoperability mode supports using S3-style requests, so we reuse the S3 filesystem while making a few changes to support scans (COPY/LOAD FROM, attach read-only DB) with GCS. In this case, we create a new S3 filesystem instance for communicating with GCS, with some different init-time configurations to signal that it's communicating with GCS.

Any URLs with prefixes gs:// or gcs:// will have their requests directed to GCS. The path formats will be the same as S3: ${PREFIX}/${BUCKET_NAME}/${PATH_TO_FILE_IN_BUCKET}.

Additionally, I added separate settings for configuring authentication parameters for GCS (they refer to the same parameters as S3):

call gcs_access_key_id=...
call gcs_secret_access_key=...

Contributor agreement

Copy link

Benchmark Result

Master commit hash: dfabf90eab17ec0dc0f87d18464152412e1fd8ee
Branch commit hash: 9b808239f13b6bdee723d303286671d6d8f32a2f

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 727.57 736.85 -9.28 (-1.26%)
aggregation q28 6334.96 6358.04 -23.08 (-0.36%)
filter q14 126.19 128.29 -2.10 (-1.64%)
filter q15 124.82 126.50 -1.69 (-1.33%)
filter q16 303.08 306.18 -3.11 (-1.01%)
filter q17 446.54 446.79 -0.25 (-0.06%)
filter q18 1939.25 1922.93 16.32 (0.85%)
filter zonemap-node 89.23 88.87 0.36 (0.41%)
filter zonemap-node-lhs-cast 88.78 90.75 -1.96 (-2.17%)
filter zonemap-node-null 88.91 90.66 -1.75 (-1.93%)
filter zonemap-rel 5386.84 5394.15 -7.31 (-0.14%)
fixed_size_expr_evaluator q07 578.91 581.95 -3.04 (-0.52%)
fixed_size_expr_evaluator q08 809.29 801.57 7.72 (0.96%)
fixed_size_expr_evaluator q09 810.53 803.44 7.09 (0.88%)
fixed_size_expr_evaluator q10 249.41 236.67 12.74 (5.38%)
fixed_size_expr_evaluator q11 236.94 229.64 7.30 (3.18%)
fixed_size_expr_evaluator q12 234.30 231.70 2.60 (1.12%)
fixed_size_expr_evaluator q13 1456.58 1465.25 -8.66 (-0.59%)
fixed_size_seq_scan q23 117.44 111.76 5.68 (5.08%)
join q29 717.77 703.37 14.40 (2.05%)
join q30 10295.69 11083.57 -787.89 (-7.11%)
join q31 7.84 9.98 -2.13 (-21.39%)
join SelectiveTwoHopJoin 54.23 59.99 -5.76 (-9.61%)
ldbc_snb_ic q35 2569.06 2607.02 -37.95 (-1.46%)
ldbc_snb_ic q36 464.04 485.56 -21.52 (-4.43%)
ldbc_snb_is q32 5.50 4.47 1.03 (22.99%)
ldbc_snb_is q33 15.56 14.83 0.72 (4.89%)
ldbc_snb_is q34 1.14 1.25 -0.10 (-8.36%)
multi-rel multi-rel-large-scan 1318.24 1392.59 -74.35 (-5.34%)
multi-rel multi-rel-lookup 20.11 32.54 -12.43 (-38.21%)
multi-rel multi-rel-small-scan 72.79 102.16 -29.37 (-28.75%)
order_by q25 129.00 131.92 -2.92 (-2.22%)
order_by q26 452.84 452.45 0.39 (0.09%)
order_by q27 1435.12 1420.37 14.75 (1.04%)
recursive_join recursive-join-bidirection 263.51 296.22 -32.71 (-11.04%)
recursive_join recursive-join-dense 7396.99 7444.01 -47.02 (-0.63%)
recursive_join recursive-join-path 24065.02 24117.33 -52.31 (-0.22%)
recursive_join recursive-join-sparse 1051.70 1057.45 -5.74 (-0.54%)
recursive_join recursive-join-trail 7378.84 7418.08 -39.23 (-0.53%)
scan_after_filter q01 171.71 175.01 -3.30 (-1.89%)
scan_after_filter q02 159.21 159.85 -0.64 (-0.40%)
shortest_path_ldbc100 q37 78.14 97.65 -19.51 (-19.97%)
shortest_path_ldbc100 q38 380.83 377.28 3.56 (0.94%)
shortest_path_ldbc100 q39 60.60 64.85 -4.25 (-6.55%)
shortest_path_ldbc100 q40 402.34 464.15 -61.81 (-13.32%)
var_size_expr_evaluator q03 2113.18 2149.45 -36.27 (-1.69%)
var_size_expr_evaluator q04 2250.11 2203.44 46.66 (2.12%)
var_size_expr_evaluator q05 2646.86 2620.11 26.75 (1.02%)
var_size_expr_evaluator q06 1339.69 1345.39 -5.70 (-0.42%)
var_size_seq_scan q19 1468.88 1459.82 9.06 (0.62%)
var_size_seq_scan q20 2348.73 2352.12 -3.39 (-0.14%)
var_size_seq_scan q21 2272.33 2311.06 -38.73 (-1.68%)
var_size_seq_scan q22 125.01 128.13 -3.12 (-2.43%)

@royi-luo royi-luo marked this pull request as ready for review February 12, 2025 19:44
Copy link

codecov bot commented Feb 12, 2025

Codecov Report

Attention: Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.

Project coverage is 86.57%. Comparing base (ab5e13a) to head (83f733a).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/common/string_utils.cpp 0.00% 2 Missing ⚠️
src/include/common/file_system/file_system.h 0.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (57.14%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4892      +/-   ##
==========================================
- Coverage   86.58%   86.57%   -0.01%     
==========================================
  Files        1409     1409              
  Lines       60887    60892       +5     
  Branches     7489     7489              
==========================================
+ Hits        52716    52719       +3     
- Misses       8001     8004       +3     
+ Partials      170      169       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@royi-luo royi-luo requested a review from acquamarin February 12, 2025 19:56
Copy link

Benchmark Result

Master commit hash: dfabf90eab17ec0dc0f87d18464152412e1fd8ee
Branch commit hash: fd3eff4f17efbd588f80391e2007c4482951aa29

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 728.47 736.85 -8.38 (-1.14%)
aggregation q28 6358.02 6358.04 -0.02 (-0.00%)
filter q14 126.44 128.29 -1.85 (-1.44%)
filter q15 123.74 126.50 -2.76 (-2.18%)
filter q16 304.99 306.18 -1.19 (-0.39%)
filter q17 446.81 446.79 0.02 (0.00%)
filter q18 1911.26 1922.93 -11.67 (-0.61%)
filter zonemap-node 89.22 88.87 0.35 (0.39%)
filter zonemap-node-lhs-cast 90.54 90.75 -0.21 (-0.23%)
filter zonemap-node-null 90.35 90.66 -0.30 (-0.33%)
filter zonemap-rel 5388.81 5394.15 -5.35 (-0.10%)
fixed_size_expr_evaluator q07 590.42 581.95 8.47 (1.46%)
fixed_size_expr_evaluator q08 813.54 801.57 11.97 (1.49%)
fixed_size_expr_evaluator q09 808.66 803.44 5.21 (0.65%)
fixed_size_expr_evaluator q10 244.11 236.67 7.44 (3.14%)
fixed_size_expr_evaluator q11 236.99 229.64 7.34 (3.20%)
fixed_size_expr_evaluator q12 235.06 231.70 3.36 (1.45%)
fixed_size_expr_evaluator q13 1455.74 1465.25 -9.51 (-0.65%)
fixed_size_seq_scan q23 115.01 111.76 3.25 (2.90%)
join q29 711.76 703.37 8.39 (1.19%)
join q30 10734.12 11083.57 -349.45 (-3.15%)
join q31 3.75 9.98 -6.23 (-62.46%)
join SelectiveTwoHopJoin 57.69 59.99 -2.30 (-3.83%)
ldbc_snb_ic q35 2551.23 2607.02 -55.78 (-2.14%)
ldbc_snb_ic q36 449.67 485.56 -35.89 (-7.39%)
ldbc_snb_is q32 6.99 4.47 2.52 (56.41%)
ldbc_snb_is q33 13.14 14.83 -1.69 (-11.37%)
ldbc_snb_is q34 1.27 1.25 0.02 (1.47%)
multi-rel multi-rel-large-scan 1349.63 1392.59 -42.95 (-3.08%)
multi-rel multi-rel-lookup 20.73 32.54 -11.81 (-36.30%)
multi-rel multi-rel-small-scan 89.65 102.16 -12.51 (-12.25%)
order_by q25 135.78 131.92 3.86 (2.92%)
order_by q26 460.24 452.45 7.79 (1.72%)
order_by q27 1424.99 1420.37 4.62 (0.33%)
recursive_join recursive-join-bidirection 266.42 296.22 -29.80 (-10.06%)
recursive_join recursive-join-dense 7399.12 7444.01 -44.89 (-0.60%)
recursive_join recursive-join-path 24211.94 24117.33 94.61 (0.39%)
recursive_join recursive-join-sparse 1064.86 1057.45 7.42 (0.70%)
recursive_join recursive-join-trail 7361.91 7418.08 -56.17 (-0.76%)
scan_after_filter q01 169.47 175.01 -5.54 (-3.17%)
scan_after_filter q02 156.23 159.85 -3.62 (-2.27%)
shortest_path_ldbc100 q37 88.17 97.65 -9.48 (-9.71%)
shortest_path_ldbc100 q38 384.09 377.28 6.82 (1.81%)
shortest_path_ldbc100 q39 61.41 64.85 -3.44 (-5.30%)
shortest_path_ldbc100 q40 463.07 464.15 -1.08 (-0.23%)
var_size_expr_evaluator q03 2107.42 2149.45 -42.03 (-1.96%)
var_size_expr_evaluator q04 2235.94 2203.44 32.49 (1.47%)
var_size_expr_evaluator q05 2684.52 2620.11 64.41 (2.46%)
var_size_expr_evaluator q06 1330.65 1345.39 -14.74 (-1.10%)
var_size_seq_scan q19 1469.42 1459.82 9.60 (0.66%)
var_size_seq_scan q20 2321.83 2352.12 -30.28 (-1.29%)
var_size_seq_scan q21 2313.38 2311.06 2.32 (0.10%)
var_size_seq_scan q22 126.92 128.13 -1.22 (-0.95%)

Copy link
Collaborator

@acquamarin acquamarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Can you add some tests? I want to make sure everything is covered.

extension/delta/src/connector/delta_connector.cpp Outdated Show resolved Hide resolved
extension/delta/src/main/delta_extension.cpp Outdated Show resolved Hide resolved
extension/iceberg/src/iceberg_extension.cpp Outdated Show resolved Hide resolved
extension/httpfs/src/include/s3fs.h Outdated Show resolved Hide resolved

namespace httpfs {

static constexpr std::array AUTH_OPTIONS = {"ACCESS_KEY_ID", "SECRET_ACCESS_KEY", "ENDPOINT",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check whether GCS support endpoint, url_style and region parameters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed those options for GCS (although they still exist internally)

@@ -24,14 +26,18 @@ static void registerExtensionOptions(main::Database* db) {

static void registerFileSystem(main::Database* db) {
db->registerFileSystem(std::make_unique<HTTPFileSystem>());
db->registerFileSystem(std::make_unique<S3FileSystem>());
for (auto& fsConfig : httpfs::S3FileSystemConfig::getAvailableConfigs()) {
db->registerFileSystem(std::make_unique<S3FileSystem>(fsConfig));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only use one fileSystem instead of two?
When we open a file, we generate a S3AuthParams class based on the URL.
The S3AuthParams is saved in the fileinfo class.
So we don't need to register two fileSystems.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep two separate filesystems if possible since I feel like it makes this more understandable + flexible unless we're concerned about performance/memory footprint here

Copy link

Benchmark Result

Master commit hash: dfabf90eab17ec0dc0f87d18464152412e1fd8ee
Branch commit hash: 65239dad298b653d47aba940a16ef39f0b474da1

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 726.41 736.85 -10.44 (-1.42%)
aggregation q28 6375.90 6358.04 17.86 (0.28%)
filter q14 126.09 128.29 -2.19 (-1.71%)
filter q15 125.06 126.50 -1.45 (-1.14%)
filter q16 302.69 306.18 -3.50 (-1.14%)
filter q17 444.33 446.79 -2.46 (-0.55%)
filter q18 1931.63 1922.93 8.69 (0.45%)
filter zonemap-node 89.13 88.87 0.26 (0.29%)
filter zonemap-node-lhs-cast 89.30 90.75 -1.45 (-1.59%)
filter zonemap-node-null 89.04 90.66 -1.62 (-1.78%)
filter zonemap-rel 5391.72 5394.15 -2.43 (-0.04%)
fixed_size_expr_evaluator q07 580.80 581.95 -1.15 (-0.20%)
fixed_size_expr_evaluator q08 808.62 801.57 7.05 (0.88%)
fixed_size_expr_evaluator q09 809.38 803.44 5.94 (0.74%)
fixed_size_expr_evaluator q10 244.86 236.67 8.19 (3.46%)
fixed_size_expr_evaluator q11 238.40 229.64 8.76 (3.81%)
fixed_size_expr_evaluator q12 236.16 231.70 4.46 (1.93%)
fixed_size_expr_evaluator q13 1452.48 1465.25 -12.77 (-0.87%)
fixed_size_seq_scan q23 115.05 111.76 3.29 (2.94%)
join q29 732.18 703.37 28.81 (4.10%)
join q30 9880.62 11083.57 -1202.95 (-10.85%)
join q31 8.46 9.98 -1.52 (-15.25%)
join SelectiveTwoHopJoin 55.87 59.99 -4.12 (-6.87%)
ldbc_snb_ic q35 2517.04 2607.02 -89.98 (-3.45%)
ldbc_snb_ic q36 469.65 485.56 -15.90 (-3.27%)
ldbc_snb_is q32 4.66 4.47 0.19 (4.17%)
ldbc_snb_is q33 12.85 14.83 -1.98 (-13.34%)
ldbc_snb_is q34 1.17 1.25 -0.08 (-6.60%)
multi-rel multi-rel-large-scan 1331.66 1392.59 -60.92 (-4.37%)
multi-rel multi-rel-lookup 20.16 32.54 -12.37 (-38.03%)
multi-rel multi-rel-small-scan 77.92 102.16 -24.24 (-23.73%)
order_by q25 130.72 131.92 -1.20 (-0.91%)
order_by q26 465.34 452.45 12.89 (2.85%)
order_by q27 1426.80 1420.37 6.43 (0.45%)
recursive_join recursive-join-bidirection 310.72 296.22 14.49 (4.89%)
recursive_join recursive-join-dense 7383.45 7444.01 -60.56 (-0.81%)
recursive_join recursive-join-path 24390.57 24117.33 273.24 (1.13%)
recursive_join recursive-join-sparse 1065.75 1057.45 8.31 (0.79%)
recursive_join recursive-join-trail 7383.41 7418.08 -34.66 (-0.47%)
scan_after_filter q01 173.51 175.01 -1.50 (-0.85%)
scan_after_filter q02 159.05 159.85 -0.80 (-0.50%)
shortest_path_ldbc100 q37 100.08 97.65 2.43 (2.49%)
shortest_path_ldbc100 q38 368.81 377.28 -8.47 (-2.24%)
shortest_path_ldbc100 q39 64.88 64.85 0.03 (0.05%)
shortest_path_ldbc100 q40 397.24 464.15 -66.91 (-14.41%)
var_size_expr_evaluator q03 2158.07 2149.45 8.62 (0.40%)
var_size_expr_evaluator q04 2273.09 2203.44 69.65 (3.16%)
var_size_expr_evaluator q05 2653.21 2620.11 33.10 (1.26%)
var_size_expr_evaluator q06 1326.35 1345.39 -19.04 (-1.42%)
var_size_seq_scan q19 1485.79 1459.82 25.97 (1.78%)
var_size_seq_scan q20 2371.81 2352.12 19.69 (0.84%)
var_size_seq_scan q21 2313.19 2311.06 2.13 (0.09%)
var_size_seq_scan q22 128.12 128.13 -0.01 (-0.01%)

@royi-luo royi-luo requested a review from acquamarin February 12, 2025 23:57
Copy link

Benchmark Result

Master commit hash: dfabf90eab17ec0dc0f87d18464152412e1fd8ee
Branch commit hash: e561459206e4acfc91ae41f12862545ea71db9bc

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 725.57 736.85 -11.28 (-1.53%)
aggregation q28 6368.62 6358.04 10.58 (0.17%)
filter q14 125.15 128.29 -3.14 (-2.45%)
filter q15 124.84 126.50 -1.67 (-1.32%)
filter q16 302.95 306.18 -3.23 (-1.06%)
filter q17 445.48 446.79 -1.31 (-0.29%)
filter q18 2002.93 1922.93 80.00 (4.16%)
filter zonemap-node 88.77 88.87 -0.10 (-0.11%)
filter zonemap-node-lhs-cast 88.74 90.75 -2.00 (-2.21%)
filter zonemap-node-null 88.60 90.66 -2.06 (-2.27%)
filter zonemap-rel 5387.84 5394.15 -6.32 (-0.12%)
fixed_size_expr_evaluator q07 577.76 581.95 -4.19 (-0.72%)
fixed_size_expr_evaluator q08 809.39 801.57 7.82 (0.98%)
fixed_size_expr_evaluator q09 809.13 803.44 5.69 (0.71%)
fixed_size_expr_evaluator q10 245.88 236.67 9.20 (3.89%)
fixed_size_expr_evaluator q11 236.50 229.64 6.86 (2.99%)
fixed_size_expr_evaluator q12 233.65 231.70 1.95 (0.84%)
fixed_size_expr_evaluator q13 1450.57 1465.25 -14.67 (-1.00%)
fixed_size_seq_scan q23 115.44 111.76 3.68 (3.29%)
join q29 732.52 703.37 29.15 (4.14%)
join q30 10025.63 11083.57 -1057.95 (-9.55%)
join q31 5.90 9.98 -4.08 (-40.87%)
join SelectiveTwoHopJoin 56.06 59.99 -3.93 (-6.55%)
ldbc_snb_ic q35 2559.89 2607.02 -47.13 (-1.81%)
ldbc_snb_ic q36 446.61 485.56 -38.94 (-8.02%)
ldbc_snb_is q32 5.75 4.47 1.28 (28.71%)
ldbc_snb_is q33 15.62 14.83 0.79 (5.35%)
ldbc_snb_is q34 1.17 1.25 -0.08 (-6.19%)
multi-rel multi-rel-large-scan 1394.52 1392.59 1.93 (0.14%)
multi-rel multi-rel-lookup 33.06 32.54 0.52 (1.59%)
multi-rel multi-rel-small-scan 93.06 102.16 -9.10 (-8.90%)
order_by q25 131.69 131.92 -0.23 (-0.18%)
order_by q26 455.59 452.45 3.13 (0.69%)
order_by q27 1406.55 1420.37 -13.82 (-0.97%)
recursive_join recursive-join-bidirection 286.45 296.22 -9.77 (-3.30%)
recursive_join recursive-join-dense 7351.88 7444.01 -92.13 (-1.24%)
recursive_join recursive-join-path 24262.37 24117.33 145.04 (0.60%)
recursive_join recursive-join-sparse 1060.64 1057.45 3.20 (0.30%)
recursive_join recursive-join-trail 7342.60 7418.08 -75.48 (-1.02%)
scan_after_filter q01 173.39 175.01 -1.62 (-0.93%)
scan_after_filter q02 161.73 159.85 1.88 (1.18%)
shortest_path_ldbc100 q37 94.36 97.65 -3.29 (-3.36%)
shortest_path_ldbc100 q38 365.35 377.28 -11.93 (-3.16%)
shortest_path_ldbc100 q39 56.14 64.85 -8.70 (-13.42%)
shortest_path_ldbc100 q40 378.27 464.15 -85.88 (-18.50%)
var_size_expr_evaluator q03 2121.95 2149.45 -27.49 (-1.28%)
var_size_expr_evaluator q04 2262.36 2203.44 58.91 (2.67%)
var_size_expr_evaluator q05 2677.78 2620.11 57.67 (2.20%)
var_size_expr_evaluator q06 1349.73 1345.39 4.34 (0.32%)
var_size_seq_scan q19 1493.78 1459.82 33.96 (2.33%)
var_size_seq_scan q20 2340.61 2352.12 -11.50 (-0.49%)
var_size_seq_scan q21 2292.29 2311.06 -18.77 (-0.81%)
var_size_seq_scan q22 124.56 128.13 -3.57 (-2.79%)

Copy link

Benchmark Result

Master commit hash: 5399489c9c70e09d78b45fb203bbf4b56404301e
Branch commit hash: bffb0a92ae0f1e68363c5ef78dbe97ec6e8e5fb3

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 723.55 729.62 -6.06 (-0.83%)
aggregation q28 6347.38 6362.61 -15.23 (-0.24%)
filter q14 126.81 129.79 -2.99 (-2.30%)
filter q15 124.58 124.99 -0.40 (-0.32%)
filter q16 299.90 309.40 -9.51 (-3.07%)
filter q17 446.06 445.21 0.85 (0.19%)
filter q18 1942.16 1924.43 17.72 (0.92%)
filter zonemap-node 89.52 91.79 -2.27 (-2.47%)
filter zonemap-node-lhs-cast 88.42 90.31 -1.89 (-2.10%)
filter zonemap-node-null 90.14 88.59 1.55 (1.75%)
filter zonemap-rel 5371.24 5434.17 -62.94 (-1.16%)
fixed_size_expr_evaluator q07 573.50 577.97 -4.48 (-0.77%)
fixed_size_expr_evaluator q08 800.63 812.23 -11.61 (-1.43%)
fixed_size_expr_evaluator q09 802.24 808.80 -6.56 (-0.81%)
fixed_size_expr_evaluator q10 238.16 245.35 -7.19 (-2.93%)
fixed_size_expr_evaluator q11 231.18 237.47 -6.29 (-2.65%)
fixed_size_expr_evaluator q12 228.57 238.64 -10.07 (-4.22%)
fixed_size_expr_evaluator q13 1464.66 1453.42 11.24 (0.77%)
fixed_size_seq_scan q23 113.63 119.23 -5.60 (-4.69%)
join q29 755.37 721.07 34.30 (4.76%)
join q30 10551.21 10508.05 43.16 (0.41%)
join q31 5.62 6.26 -0.65 (-10.30%)
join SelectiveTwoHopJoin 55.49 55.28 0.21 (0.39%)
ldbc_snb_ic q35 2609.21 2596.48 12.73 (0.49%)
ldbc_snb_ic q36 430.40 474.31 -43.91 (-9.26%)
ldbc_snb_is q32 5.46 5.20 0.26 (5.00%)
ldbc_snb_is q33 15.78 15.39 0.39 (2.52%)
ldbc_snb_is q34 1.22 1.26 -0.04 (-2.91%)
multi-rel multi-rel-large-scan 1468.27 1328.34 139.93 (10.53%)
multi-rel multi-rel-lookup 44.07 32.59 11.48 (35.23%)
multi-rel multi-rel-small-scan 59.76 58.12 1.64 (2.83%)
order_by q25 127.80 127.19 0.61 (0.48%)
order_by q26 449.11 462.67 -13.57 (-2.93%)
order_by q27 1405.78 1414.42 -8.64 (-0.61%)
recursive_join recursive-join-bidirection 284.87 308.41 -23.53 (-7.63%)
recursive_join recursive-join-dense 7381.50 7426.05 -44.55 (-0.60%)
recursive_join recursive-join-path 24230.84 24248.80 -17.96 (-0.07%)
recursive_join recursive-join-sparse 1058.44 1060.88 -2.44 (-0.23%)
recursive_join recursive-join-trail 7395.18 7383.34 11.84 (0.16%)
scan_after_filter q01 172.23 175.73 -3.50 (-1.99%)
scan_after_filter q02 157.96 159.94 -1.98 (-1.24%)
shortest_path_ldbc100 q37 95.72 96.52 -0.80 (-0.83%)
shortest_path_ldbc100 q38 370.20 406.01 -35.80 (-8.82%)
shortest_path_ldbc100 q39 64.94 65.20 -0.27 (-0.41%)
shortest_path_ldbc100 q40 456.60 415.60 41.00 (9.87%)
var_size_expr_evaluator q03 2079.87 2136.30 -56.43 (-2.64%)
var_size_expr_evaluator q04 2241.69 2259.95 -18.27 (-0.81%)
var_size_expr_evaluator q05 2685.53 2995.92 -310.39 (-10.36%)
var_size_expr_evaluator q06 1328.55 1330.58 -2.03 (-0.15%)
var_size_seq_scan q19 1465.46 1504.61 -39.14 (-2.60%)
var_size_seq_scan q20 2321.40 2373.59 -52.19 (-2.20%)
var_size_seq_scan q21 2268.05 2315.78 -47.73 (-2.06%)
var_size_seq_scan q22 125.65 126.69 -1.05 (-0.83%)

Copy link

Benchmark Result

Master commit hash: 5399489c9c70e09d78b45fb203bbf4b56404301e
Branch commit hash: 82760a381e68135a2f1be91c0ede2417331d448a

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 723.80 729.62 -5.82 (-0.80%)
aggregation q28 6360.59 6362.61 -2.02 (-0.03%)
filter q14 126.90 129.79 -2.89 (-2.23%)
filter q15 123.01 124.99 -1.98 (-1.58%)
filter q16 299.83 309.40 -9.57 (-3.09%)
filter q17 446.82 445.21 1.61 (0.36%)
filter q18 1930.29 1924.43 5.86 (0.30%)
filter zonemap-node 88.76 91.79 -3.03 (-3.30%)
filter zonemap-node-lhs-cast 89.03 90.31 -1.28 (-1.42%)
filter zonemap-node-null 88.74 88.59 0.16 (0.18%)
filter zonemap-rel 5402.58 5434.17 -31.59 (-0.58%)
fixed_size_expr_evaluator q07 570.24 577.97 -7.73 (-1.34%)
fixed_size_expr_evaluator q08 801.42 812.23 -10.81 (-1.33%)
fixed_size_expr_evaluator q09 801.55 808.80 -7.26 (-0.90%)
fixed_size_expr_evaluator q10 236.34 245.35 -9.02 (-3.67%)
fixed_size_expr_evaluator q11 229.36 237.47 -8.11 (-3.41%)
fixed_size_expr_evaluator q12 226.20 238.64 -12.44 (-5.21%)
fixed_size_expr_evaluator q13 1462.01 1453.42 8.59 (0.59%)
fixed_size_seq_scan q23 117.45 119.23 -1.78 (-1.49%)
join q29 740.75 721.07 19.68 (2.73%)
join q30 10366.76 10508.05 -141.28 (-1.34%)
join q31 7.31 6.26 1.04 (16.68%)
join SelectiveTwoHopJoin 54.49 55.28 -0.78 (-1.41%)
ldbc_snb_ic q35 2616.75 2596.48 20.27 (0.78%)
ldbc_snb_ic q36 451.21 474.31 -23.11 (-4.87%)
ldbc_snb_is q32 3.96 5.20 -1.24 (-23.83%)
ldbc_snb_is q33 15.61 15.39 0.22 (1.45%)
ldbc_snb_is q34 1.27 1.26 0.01 (0.56%)
multi-rel multi-rel-large-scan 1410.52 1328.34 82.18 (6.19%)
multi-rel multi-rel-lookup 32.34 32.59 -0.25 (-0.76%)
multi-rel multi-rel-small-scan 78.32 58.12 20.20 (34.75%)
order_by q25 130.54 127.19 3.35 (2.63%)
order_by q26 451.16 462.67 -11.51 (-2.49%)
order_by q27 1412.49 1414.42 -1.93 (-0.14%)
recursive_join recursive-join-bidirection 291.18 308.41 -17.23 (-5.59%)
recursive_join recursive-join-dense 6216.42 7426.05 -1209.63 (-16.29%)
recursive_join recursive-join-path 24056.59 24248.80 -192.21 (-0.79%)
recursive_join recursive-join-sparse 1057.00 1060.88 -3.88 (-0.37%)
recursive_join recursive-join-trail 6951.63 7383.34 -431.71 (-5.85%)
scan_after_filter q01 170.66 175.73 -5.07 (-2.89%)
scan_after_filter q02 156.80 159.94 -3.14 (-1.97%)
shortest_path_ldbc100 q37 95.66 96.52 -0.86 (-0.89%)
shortest_path_ldbc100 q38 382.35 406.01 -23.66 (-5.83%)
shortest_path_ldbc100 q39 62.55 65.20 -2.65 (-4.07%)
shortest_path_ldbc100 q40 425.73 415.60 10.13 (2.44%)
var_size_expr_evaluator q03 2097.70 2136.30 -38.60 (-1.81%)
var_size_expr_evaluator q04 2224.19 2259.95 -35.77 (-1.58%)
var_size_expr_evaluator q05 2714.28 2995.92 -281.64 (-9.40%)
var_size_expr_evaluator q06 1327.30 1330.58 -3.28 (-0.25%)
var_size_seq_scan q19 1458.22 1504.61 -46.39 (-3.08%)
var_size_seq_scan q20 2321.45 2373.59 -52.14 (-2.20%)
var_size_seq_scan q21 2265.48 2315.78 -50.31 (-2.17%)
var_size_seq_scan q22 125.75 126.69 -0.95 (-0.75%)

@acquamarin acquamarin mentioned this pull request Feb 13, 2025
26 tasks
Copy link

Benchmark Result

Master commit hash: 5399489c9c70e09d78b45fb203bbf4b56404301e
Branch commit hash: ddbaedfe80a0331761237d9d6da4a91243834a47

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 738.40 729.62 8.79 (1.20%)
aggregation q28 6342.60 6362.61 -20.01 (-0.31%)
filter q14 125.67 129.79 -4.12 (-3.18%)
filter q15 126.72 124.99 1.74 (1.39%)
filter q16 298.93 309.40 -10.48 (-3.39%)
filter q17 449.10 445.21 3.89 (0.87%)
filter q18 1924.32 1924.43 -0.11 (-0.01%)
filter zonemap-node 89.41 91.79 -2.38 (-2.59%)
filter zonemap-node-lhs-cast 89.25 90.31 -1.07 (-1.18%)
filter zonemap-node-null 88.59 88.59 0.01 (0.01%)
filter zonemap-rel 5422.89 5434.17 -11.28 (-0.21%)
fixed_size_expr_evaluator q07 570.53 577.97 -7.45 (-1.29%)
fixed_size_expr_evaluator q08 801.96 812.23 -10.27 (-1.26%)
fixed_size_expr_evaluator q09 801.44 808.80 -7.37 (-0.91%)
fixed_size_expr_evaluator q10 237.65 245.35 -7.71 (-3.14%)
fixed_size_expr_evaluator q11 229.70 237.47 -7.77 (-3.27%)
fixed_size_expr_evaluator q12 227.08 238.64 -11.56 (-4.84%)
fixed_size_expr_evaluator q13 1461.43 1453.42 8.01 (0.55%)
fixed_size_seq_scan q23 110.85 119.23 -8.38 (-7.03%)
join q29 773.71 721.07 52.64 (7.30%)
join q30 9737.99 10508.05 -770.06 (-7.33%)
join q31 7.13 6.26 0.86 (13.80%)
join SelectiveTwoHopJoin 58.71 55.28 3.44 (6.22%)
ldbc_snb_ic q35 2588.54 2596.48 -7.93 (-0.31%)
ldbc_snb_ic q36 432.83 474.31 -41.48 (-8.75%)
ldbc_snb_is q32 5.92 5.20 0.72 (13.90%)
ldbc_snb_is q33 15.35 15.39 -0.04 (-0.23%)
ldbc_snb_is q34 1.29 1.26 0.03 (2.41%)
multi-rel multi-rel-large-scan 1397.39 1328.34 69.05 (5.20%)
multi-rel multi-rel-lookup 28.04 32.59 -4.55 (-13.95%)
multi-rel multi-rel-small-scan 96.92 58.12 38.80 (66.76%)
order_by q25 130.43 127.19 3.23 (2.54%)
order_by q26 449.31 462.67 -13.37 (-2.89%)
order_by q27 1402.39 1414.42 -12.03 (-0.85%)
recursive_join recursive-join-bidirection 293.60 308.41 -14.80 (-4.80%)
recursive_join recursive-join-dense 5937.09 7426.05 -1488.95 (-20.05%)
recursive_join recursive-join-path 23489.03 24248.80 -759.77 (-3.13%)
recursive_join recursive-join-sparse 1056.59 1060.88 -4.29 (-0.40%)
recursive_join recursive-join-trail 6184.99 7383.34 -1198.35 (-16.23%)
scan_after_filter q01 172.10 175.73 -3.63 (-2.07%)
scan_after_filter q02 157.36 159.94 -2.58 (-1.61%)
shortest_path_ldbc100 q37 91.17 96.52 -5.35 (-5.54%)
shortest_path_ldbc100 q38 364.52 406.01 -41.49 (-10.22%)
shortest_path_ldbc100 q39 64.61 65.20 -0.59 (-0.91%)
shortest_path_ldbc100 q40 403.37 415.60 -12.23 (-2.94%)
var_size_expr_evaluator q03 2108.70 2136.30 -27.60 (-1.29%)
var_size_expr_evaluator q04 2263.80 2259.95 3.84 (0.17%)
var_size_expr_evaluator q05 2683.14 2995.92 -312.78 (-10.44%)
var_size_expr_evaluator q06 1329.52 1330.58 -1.06 (-0.08%)
var_size_seq_scan q19 1462.11 1504.61 -42.50 (-2.82%)
var_size_seq_scan q20 2314.26 2373.59 -59.34 (-2.50%)
var_size_seq_scan q21 2281.26 2315.78 -34.53 (-1.49%)
var_size_seq_scan q22 125.08 126.69 -1.61 (-1.27%)

--

-CASE UploadToGCS
-SKIP
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason for the UploadToS3 test being disabled, was it because it required writing to a new bucket for each CI run? I've added the test here with -SKIP so that we can run it locally (it works locally for me)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is expensive, we want to reduce the cost

Make gcs use gcs auth params

Undo renames

Fix todo

Fix

Fix duplicate duckdb secret names

Fix msvc compile

Allow CREATE_AND_TRUNCATE_IF_EXISTS for http openFile()

Address review comments 1

Address review comments 2

Add KUZU_API declarations

Fix extension compile

Self-review

GCS tests (#4897)

* Add gcs scan tests

* Add attach to remote db tests

* Add writing test

* Add generate binary tinysnb
@royi-luo royi-luo force-pushed the royi/gcs branch 3 times, most recently from 088b25e to 75339f9 Compare February 18, 2025 15:13
@royi-luo royi-luo changed the title Add GCS support for scans Add GCS support Feb 18, 2025
Copy link

Benchmark Result

Master commit hash: ab5e13acdf51bbdc1ed367ecada1e49bda20831a
Branch commit hash: 106709115ee8bab6ba81cf5cc46d15f0f8bcee13

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 739.34 724.21 15.13 (2.09%)
aggregation q28 6362.54 6502.36 -139.82 (-2.15%)
filter q14 143.97 128.45 15.52 (12.08%)
filter q15 143.26 123.74 19.52 (15.78%)
filter q16 321.82 307.65 14.17 (4.61%)
filter q17 463.40 450.69 12.72 (2.82%)
filter q18 1962.10 1923.04 39.06 (2.03%)
filter zonemap-node 105.38 89.56 15.83 (17.67%)
filter zonemap-node-lhs-cast 105.11 90.79 14.32 (15.77%)
filter zonemap-node-null 104.94 90.65 14.29 (15.76%)
filter zonemap-rel 5628.41 5375.49 252.91 (4.70%)
fixed_size_expr_evaluator q07 587.55 578.80 8.75 (1.51%)
fixed_size_expr_evaluator q08 817.52 806.55 10.97 (1.36%)
fixed_size_expr_evaluator q09 816.66 810.38 6.28 (0.78%)
fixed_size_expr_evaluator q10 253.53 244.36 9.18 (3.76%)
fixed_size_expr_evaluator q11 245.70 236.82 8.88 (3.75%)
fixed_size_expr_evaluator q12 243.08 234.18 8.90 (3.80%)
fixed_size_expr_evaluator q13 1475.02 1459.17 15.85 (1.09%)
fixed_size_seq_scan q23 127.73 114.22 13.51 (11.83%)
join q29 724.25 758.63 -34.38 (-4.53%)
join q30 9546.04 10732.22 -1186.18 (-11.05%)
join q31 6.31 6.53 -0.21 (-3.28%)
join SelectiveTwoHopJoin 56.89 57.44 -0.55 (-0.96%)
ldbc_snb_ic q35 2681.88 2620.21 61.67 (2.35%)
ldbc_snb_ic q36 466.16 478.63 -12.47 (-2.60%)
ldbc_snb_is q32 6.55 6.88 -0.32 (-4.67%)
ldbc_snb_is q33 14.34 16.42 -2.08 (-12.68%)
ldbc_snb_is q34 1.18 1.25 -0.07 (-5.28%)
multi-rel multi-rel-large-scan 1491.18 1323.69 167.49 (12.65%)
multi-rel multi-rel-lookup 32.19 10.76 21.43 (199.23%)
multi-rel multi-rel-small-scan 97.80 95.94 1.86 (1.94%)
order_by q25 147.69 129.18 18.51 (14.32%)
order_by q26 472.66 460.36 12.30 (2.67%)
order_by q27 1432.63 1408.31 24.32 (1.73%)
recursive_join recursive-join-bidirection 316.88 281.48 35.41 (12.58%)
recursive_join recursive-join-dense 7078.70 6479.78 598.92 (9.24%)
recursive_join recursive-join-path 23805.63 23541.43 264.20 (1.12%)
recursive_join recursive-join-sparse 1054.37 1058.54 -4.17 (-0.39%)
recursive_join recursive-join-trail 7051.91 6996.13 55.78 (0.80%)
scan_after_filter q01 187.96 171.26 16.70 (9.75%)
scan_after_filter q02 173.54 156.35 17.19 (11.00%)
shortest_path_ldbc100 q37 75.12 101.28 -26.16 (-25.83%)
shortest_path_ldbc100 q38 285.62 401.02 -115.40 (-28.78%)
shortest_path_ldbc100 q39 66.47 64.81 1.66 (2.56%)
shortest_path_ldbc100 q40 463.84 456.99 6.84 (1.50%)
var_size_expr_evaluator q03 2141.74 2078.60 63.14 (3.04%)
var_size_expr_evaluator q04 2290.68 2223.86 66.82 (3.00%)
var_size_expr_evaluator q05 2657.21 2616.98 40.23 (1.54%)
var_size_expr_evaluator q06 1363.95 1345.40 18.55 (1.38%)
var_size_seq_scan q19 1497.82 1452.45 45.37 (3.12%)
var_size_seq_scan q20 2505.96 2319.10 186.85 (8.06%)
var_size_seq_scan q21 2316.95 2276.58 40.37 (1.77%)
var_size_seq_scan q22 133.10 125.90 7.20 (5.72%)

@royi-luo royi-luo merged commit 113ba6b into master Feb 18, 2025
24 of 25 checks passed
@royi-luo royi-luo deleted the royi/gcs branch February 18, 2025 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants