-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outline the best settings for Parsl configuration with CytoTable #176
Comments
From #163:
import cytotable
import parsl
from parsl.config import Config
from parsl.executors import ThreadPoolExecutor
cytotable.convert(
...
parsl_config=parsl.load(
Config(
executors=[
ThreadPoolExecutor(
# set maximum number of threads at any time, for example 3.
# if not set, the default is 2.
max_threads=3,
)
]
)
),
) |
Hi @shntnu - I wanted to follow up as you prepare for the work you originally mentioned in #163 (
I can also work towards providing more generalized guidance here, but figured it might be good to work this issue from your perspective. |
Thank you for doing this!
This is a draft of the script we will use; we will likely need to iterate on the join (unrelated to CytoTable) We will eventually want do it on all the 2378 SQLite files in But to get started, we will likely do it on a smaller batch of data from
We will do this on an EC2 instance, so we can configure it at will. Most likely, we will create something similar to https://github.com/DistributedScience/Distributed-Collate for this although some of us are experimenting with skypilot for tasks like this. |
Thanks @shntnu ! I took a look at those resources you provided - very neat to see the things happening within https://github.com/broadinstitute/cpg 🙂 . Sharing some thoughts and questions based on work through this Google Colab notebook (and GitHub Gist backup).
All this said, I'm still looking into testing Parsl configurations through CytoTable to ensure this works as best it can for your use-case. Could the following object paths for $ aws s3 ls --recursive --human-readable --summarize --no-sign-request s3://cellpainting-gallery/cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12 | grep .sqlite
2022-10-02 06:14:43 22.0 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126113/BR00126113.sqlite
2022-10-02 07:47:03 3.6 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126114/BR00126114.sqlite
2022-10-02 06:14:43 10.3 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126115/BR00126115.sqlite
2022-10-02 06:14:42 17.5 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126116/BR00126116.sqlite
2022-10-02 07:21:28 27.9 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126117/BR00126117.sqlite
2022-10-02 06:57:52 18.7 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126706/BR00126706.sqlite
2022-10-02 06:14:43 25.9 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126707/BR00126707.sqlite
2022-10-02 06:14:43 17.6 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126708/BR00126708.sqlite
2022-10-02 07:29:05 18.2 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126709/BR00126709.sqlite
2022-10-02 07:32:34 18.8 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126710/BR00126710.sqlite
2022-10-02 06:14:42 18.0 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126711/BR00126711.sqlite
2022-10-02 06:14:42 17.4 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126712/BR00126712.sqlite
2022-10-02 07:27:15 18.7 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126714/BR00126714.sqlite
2022-10-02 08:01:59 19.9 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126715/BR00126715.sqlite
2022-10-02 07:24:01 18.4 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126716/BR00126716.sqlite
2022-10-02 06:14:43 17.8 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126717/BR00126717.sqlite
2022-10-02 06:14:41 18.4 GiB cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126718/BR00126718.sqlite For now I might start by benchmarking how operations occur with |
For |
That was all @leoank 🎉 who built on @johnarevalo's https://github.com/jump-cellpainting/data-validation (private) @ErinWeisbart has been leading the broader effort of making the gallery useful for humanity https://arxiv.org/abs/2402.02203
Thanks for diving in!
@leoank can decide what to do here (create an issue or resolve here)
Yes
That works Thanks again for looking into this! |
Thank you @ErinWeisbart and @shntnu for the replies here, helpful! I'm planning to follow up here this week with findings / thoughts on best practices. |
Originally posted by @d33bs in discussion with @shntnu via #163 (comment)
The text was updated successfully, but these errors were encountered: