Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance and Integration Questions for regvelo with Custom Data #17

Closed
JiehoonKwak opened this issue Dec 19, 2024 · 3 comments
Closed

Comments

@JiehoonKwak
Copy link

Performance and Integration Questions for regvelo with Custom Data

Hi!
Thank you for creating such an amazing package! I’ve been exploring it with my own dataset, but I’m encountering significant delays during model training.

I tested it using an A100 GPU, as well as H200-SMX and L40 GPUs, but the estimated training time is extremely long—around 67:39:40 (with 162.50s/it on the L40).

I came across issue #15 and downgraded scvi-tools to version 1.2.0, but the problem persists.

  • I followed the preliminary tutorial, modifying only the necessary parts for my dataset.
  • My dataset (12 samples, scANVI-integrated) dimensions are as follows:
adata: (49499, 578)  
W: (578, 578)  
TF_list: 51  

I have a few additional questions:

  1. Should I process my data sample-by-sample instead of integrating multiple samples? Is regvelo robust to batch effects?
  2. If I need to process sample-by-sample, do you have any recommendations for integrating the outputs effectively?

It’s possible that I’ve made an error somewhere—please let me know if there’s anything I should verify or adjust.
Thank you again for this creative and inspiring work!

Best,
Jiehoon

@WWXkenmo
Copy link
Member

Hi Jiehoon,

Thank you for your interest in our work!

Since Regvelo relies on a parallel numerical solver, the training time can be quite long, especially with datasets containing more than 40k cells. Additionally, I discourage performing dynamic inference on multiple disconnected samples with batch effects, as it may lead to incorrect kinetic inferences and spurious regulation predictions. I recommend focusing on interpreting dynamic results on each sample individually.

Regarding integration, it depends on the type of output you want to integrate. For certain downstream tasks, such as quantitative statistics inferred by Regvelo (e.g., perturbation effects), you can simply calculate the average values and use them for the final prediction.

Please let me know if you have any further questions!

Best regards,
Ken

@WWXkenmo
Copy link
Member

Hi Jiehoon,

May I know further status? Please let me know if I can help anything!

Thanks,
Ken

@JiehoonKwak
Copy link
Author

@WWXkenmo

Dear Ken,

Thank you for following up and for your valuable advice. Running regvelo on individual samples has significantly improved performance and works seamlessly with my data. I truly appreciate your guidance on this matter.

I’ll continue to monitor for any updates to regvelo and explore its capabilities further. Wishing you a wonderful holiday season!

Best regards,
Jiehoon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants