-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GCS support #4892
Add GCS support #4892
Conversation
Benchmark ResultMaster commit hash:
|
Codecov ReportAttention: Patch coverage is
❌ Your patch status has failed because the patch coverage (57.14%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## master #4892 +/- ##
==========================================
- Coverage 86.58% 86.57% -0.01%
==========================================
Files 1409 1409
Lines 60887 60892 +5
Branches 7489 7489
==========================================
+ Hits 52716 52719 +3
- Misses 8001 8004 +3
+ Partials 170 169 -1 ☔ View full report in Codecov by Sentry. |
Benchmark ResultMaster commit hash:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Can you add some tests? I want to make sure everything is covered.
|
||
namespace httpfs { | ||
|
||
static constexpr std::array AUTH_OPTIONS = {"ACCESS_KEY_ID", "SECRET_ACCESS_KEY", "ENDPOINT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you double check whether GCS support endpoint, url_style and region parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed those options for GCS (although they still exist internally)
@@ -24,14 +26,18 @@ static void registerExtensionOptions(main::Database* db) { | |||
|
|||
static void registerFileSystem(main::Database* db) { | |||
db->registerFileSystem(std::make_unique<HTTPFileSystem>()); | |||
db->registerFileSystem(std::make_unique<S3FileSystem>()); | |||
for (auto& fsConfig : httpfs::S3FileSystemConfig::getAvailableConfigs()) { | |||
db->registerFileSystem(std::make_unique<S3FileSystem>(fsConfig)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we only use one fileSystem instead of two?
When we open a file, we generate a S3AuthParams class based on the URL.
The S3AuthParams is saved in the fileinfo class.
So we don't need to register two fileSystems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep two separate filesystems if possible since I feel like it makes this more understandable + flexible unless we're concerned about performance/memory footprint here
Benchmark ResultMaster commit hash:
|
Benchmark ResultMaster commit hash:
|
Benchmark ResultMaster commit hash:
|
Benchmark ResultMaster commit hash:
|
Benchmark ResultMaster commit hash:
|
-- | ||
|
||
-CASE UploadToGCS | ||
-SKIP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reason for the UploadToS3
test being disabled, was it because it required writing to a new bucket for each CI run? I've added the test here with -SKIP
so that we can run it locally (it works locally for me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is expensive, we want to reduce the cost
Make gcs use gcs auth params Undo renames Fix todo Fix Fix duplicate duckdb secret names Fix msvc compile Allow CREATE_AND_TRUNCATE_IF_EXISTS for http openFile() Address review comments 1 Address review comments 2 Add KUZU_API declarations Fix extension compile Self-review GCS tests (#4897) * Add gcs scan tests * Add attach to remote db tests * Add writing test * Add generate binary tinysnb
088b25e
to
75339f9
Compare
Benchmark ResultMaster commit hash:
|
Description
GCS's interoperability mode supports using S3-style requests, so we reuse the S3 filesystem while making a few changes to support scans (COPY/LOAD FROM, attach read-only DB) with GCS. In this case, we create a new S3 filesystem instance for communicating with GCS, with some different init-time configurations to signal that it's communicating with GCS.
Any URLs with prefixes
gs://
orgcs://
will have their requests directed to GCS. The path formats will be the same as S3:${PREFIX}/${BUCKET_NAME}/${PATH_TO_FILE_IN_BUCKET}
.Additionally, I added separate settings for configuring authentication parameters for GCS (they refer to the same parameters as S3):
Contributor agreement