Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uploader CF migration into provider #22

Merged
merged 6 commits into from
Jan 17, 2024
Merged

Conversation

avichalp
Copy link
Contributor

@avichalp avichalp commented Dec 27, 2023

This PR adds the Web3 Storage integration in the provider. It will replace the current GCP CF used to upload files to W3S.

After the file is uploaded to GCP and its signature is validated. It is downloaded in a streaming fashion and then uploaded to W3S using the this library. The original library cannot be used as it is; hence, I have forked it and added the changes we need.

Uploading to W3S requires using a "multiple threaded scheduler". W3S only accepts < 100MB CAR files. When the files are bigger, they are uploaded as multiple CAR files that can be later combined. The combined file can be retrieved using the root CID. The W3S uploader must block a thread to wait for all individual CAR files to finish uploading, which is impossible when we use Tokio with a single thread.

After the W3S upload is finished, a record is added to the jobs table (as it is currently happening in CF) with all the required metadata. Also, not we don't attach any metadata on the object stored in the bucket.

Adds w3s token in the worker flag and the env var. It is required to call the upload API.

  • Add w3s token in secrets

@avichalp avichalp changed the title Avichalp/uploader migration uploader CF migration into provider Dec 27, 2023
@avichalp avichalp force-pushed the avichalp/uploader-migration branch 2 times, most recently from 62684c4 to f5b1477 Compare December 28, 2023 10:40
@@ -87,7 +91,7 @@ enum EvmType {
Remote,
}

#[tokio::main(flavor = "current_thread")]
#[tokio::main(flavor = "multi_thread", worker_threads = 2)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w3s-rs lib needs multiple threaded runtime because when the files are larger than 100MB it splits them into multiple car file and uploads them concurrently. it blocks the current thread to wait for all the concurrent jobs to finish their uploads.

@@ -433,23 +410,6 @@ pub async fn write_event(
}
};

let mut metadata: HashMap<String, String> = HashMap::from([(
Copy link
Contributor Author

@avichalp avichalp Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop adding metadata to bucket object

.map_err(|e| basin_common::errors::Error::Upload(e.to_string()))?;

let result_root_cid = result_cids
.last()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extracts the root cid from all the uploaded car splits

@@ -59,9 +68,9 @@ tokio = { version = "1.32.0", features = ["macros", "net", "rt"] }
tokio-cron-scheduler = { version = "0.9.4", features = ["signal"] }
tokio-util = { version = "0.7.8", features = ["compat"] }
url = "2.4.1"
w3s = { git = "https://github.com/avichalp/w3s-rs", branch = "main" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have forked w3s-rs here to make it usable in this project.

@avichalp avichalp marked this pull request as ready for review December 28, 2023 15:51
@sanderpick
Copy link
Member

We can ditch the single threaded setup... that was only there for capnproto.

Copy link
Collaborator

@brunocalza brunocalza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Left one comment, but not a big deal.

Another thing is that uploading to GCS is not necessary now. We started using GCS as a way "to pass" the files to the CF so we could upload them to Web3 Storage. Then we leveraged that use to emulate a cache. Moving forward, the nodes will be storing the files themselves while the file is "cached".

Next steps to have in mind for planning:

  • Get rid of GCS in favor of storing files locally
  • Transition to the new db model
  • Potentially get rid of Web3Storage and use only Filecoin with Filecoin Deal Making

.checked_add_signed(chrono::Duration::minutes(duration))
.unwrap()
.naive_utc(),
None => chrono::Utc::now().naive_utc(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess if cache_duration is None, expires_at should be None as well

@avichalp avichalp force-pushed the avichalp/uploader-migration branch from 444cd23 to 9883d3a Compare January 17, 2024 16:11
avichalp and others added 6 commits January 18, 2024 00:19
Signed-off-by: avichalp <hi@avichalp.me>
Signed-off-by: avichalp <hi@avichalp.me>
Signed-off-by: avichalp <hi@avichalp.me>
Signed-off-by: avichalp <hi@avichalp.me>
Signed-off-by: avichalp <hi@avichalp.me>
Signed-off-by: Bruno Calza <brunoangelicalza@gmail.com>
Signed-off-by: avichalp <hi@avichalp.me>
@avichalp avichalp force-pushed the avichalp/uploader-migration branch from 9883d3a to f4b5e72 Compare January 17, 2024 16:19
@avichalp avichalp merged commit 7946e4a into main Jan 17, 2024
5 checks passed
@avichalp avichalp deleted the avichalp/uploader-migration branch January 17, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants