-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uploader CF migration into provider #22
Conversation
62684c4
to
f5b1477
Compare
@@ -87,7 +91,7 @@ enum EvmType { | |||
Remote, | |||
} | |||
|
|||
#[tokio::main(flavor = "current_thread")] | |||
#[tokio::main(flavor = "multi_thread", worker_threads = 2)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w3s-rs
lib needs multiple threaded runtime because when the files are larger than 100MB it splits them into multiple car file and uploads them concurrently. it blocks the current thread to wait for all the concurrent jobs to finish their uploads.
@@ -433,23 +410,6 @@ pub async fn write_event( | |||
} | |||
}; | |||
|
|||
let mut metadata: HashMap<String, String> = HashMap::from([( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stop adding metadata to bucket object
.map_err(|e| basin_common::errors::Error::Upload(e.to_string()))?; | ||
|
||
let result_root_cid = result_cids | ||
.last() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extracts the root cid from all the uploaded car splits
@@ -59,9 +68,9 @@ tokio = { version = "1.32.0", features = ["macros", "net", "rt"] } | |||
tokio-cron-scheduler = { version = "0.9.4", features = ["signal"] } | |||
tokio-util = { version = "0.7.8", features = ["compat"] } | |||
url = "2.4.1" | |||
w3s = { git = "https://github.com/avichalp/w3s-rs", branch = "main" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have forked w3s-rs here to make it usable in this project.
We can ditch the single threaded setup... that was only there for capnproto. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Left one comment, but not a big deal.
Another thing is that uploading to GCS is not necessary now. We started using GCS as a way "to pass" the files to the CF so we could upload them to Web3 Storage. Then we leveraged that use to emulate a cache. Moving forward, the nodes will be storing the files themselves while the file is "cached".
Next steps to have in mind for planning:
- Get rid of GCS in favor of storing files locally
- Transition to the new db model
- Potentially get rid of Web3Storage and use only Filecoin with Filecoin Deal Making
lib/worker/src/db/publications.rs
Outdated
.checked_add_signed(chrono::Duration::minutes(duration)) | ||
.unwrap() | ||
.naive_utc(), | ||
None => chrono::Utc::now().naive_utc(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess if cache_duration
is None
, expires_at
should be None as well
444cd23
to
9883d3a
Compare
Signed-off-by: avichalp <hi@avichalp.me>
Signed-off-by: avichalp <hi@avichalp.me>
Signed-off-by: avichalp <hi@avichalp.me>
Signed-off-by: avichalp <hi@avichalp.me>
Signed-off-by: avichalp <hi@avichalp.me>
Signed-off-by: Bruno Calza <brunoangelicalza@gmail.com> Signed-off-by: avichalp <hi@avichalp.me>
9883d3a
to
f4b5e72
Compare
This PR adds the Web3 Storage integration in the provider. It will replace the current GCP CF used to upload files to W3S.
After the file is uploaded to GCP and its signature is validated. It is downloaded in a streaming fashion and then uploaded to W3S using the this library. The original library cannot be used as it is; hence, I have forked it and added the changes we need.
Uploading to W3S requires using a "multiple threaded scheduler". W3S only accepts < 100MB CAR files. When the files are bigger, they are uploaded as multiple CAR files that can be later combined. The combined file can be retrieved using the root CID. The W3S uploader must block a thread to wait for all individual CAR files to finish uploading, which is impossible when we use Tokio with a single thread.
After the W3S upload is finished, a record is added to the
jobs
table (as it is currently happening in CF) with all the required metadata. Also, not we don't attach any metadata on the object stored in the bucket.Adds w3s token in the worker flag and the env var. It is required to call the upload API.