-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize object store cache operations #17025
optimize object store cache operations #17025
Conversation
added sync_cache parameter to the get_filename function that allows skip pulling to cache when not needed
Co-authored-by: John Davis <jdavcs@gmail.com>
Thank you @SergeyYakubov, this looks great! I'll wait for the remaining tests to pass, after which we can merge this. |
I had to configure S3 object store to always update cache by upload - d0eef6a. The reason is that these tests were failing because Galaxy did not pull data correctly from objects store in case of composite datasets. This problem is "hidden" when the cache is shared between the job and Galaxy - then pull still does not happen, but the data is there because set_metadata put it there. It would be good to create integration tests that explicitly reveal the problem - Galaxy/Pulsar/extended metadata/object store/Galaxy cache and Pulsar cache are separate storage. it is unrelated to this PR, so I just made the tests work for now by using a shared cache as before. |
Added this to our Testing Requests board |
This PR was merged without a "kind/" label, please correct. |
This is the second step after #16783. We introduce
sync_cache
parameter to theget_file_name()
function and use it to postpone pulling to Galaxy cache until the data is really needed. Also added thecache_updated_data
parameter to the object store config that allows saving local storage by sending data directly to an object store without storing it in the cache. This is useful by itself (e.g. when a job is running in Pulsar and we just want to send results to the object store) and also used for integration tests to test thesync_cache
functionality.How to test the changes?
(Select all options that apply)
License