Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify the format of written sparse matrices during writeH5AD #132

Open
axelalmet opened this issue Dec 6, 2024 · 2 comments
Open

Specify the format of written sparse matrices during writeH5AD #132

axelalmet opened this issue Dec 6, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@axelalmet
Copy link

Hi Zellkonverter team!

First off, I really appreciate this package and have found it the most useful for my own purposes of single-cell analysis in both Python and R.

When looking at the converted output in Python, I've noticed that writeH5AD converts the expression matrices from R into Compressed Sparse Column matrices, i.e., scipy.sparse's csc_matrix format. This is generally fine, but I have noticed that it does increase the size of the written h5ad file (almost double), which can be a problem for bigger datasets, which for my laptop, is on the order of 100K cells.

Is there any way to specify that writeH5AD converts assays as Compressed Sparse Row matrices, so that I don't have to convert all of the matrices in Python?

Thank you!

Best wishes,
Axel.

@lazappi lazappi added the enhancement New feature or request label Dec 9, 2024
@lazappi
Copy link
Member

lazappi commented Dec 9, 2024

Hi @axelalmet

Thanks for your kind words about {zellkonverter}! We don't have an argument for this currently but perhaps we should, or it might even make sense to always convert to csr_matrix if that is the preferred format.

I will try to look into this sometime soon but if you wanted to contribute a PR that would also be great.

@axelalmet
Copy link
Author

Thanks for getting back to me, @lazappi!

I took a quick look at the code and I think the simplest fix would involve modifying the following line?

If I can find time over the next couple of weeks, I'll try to make specifying csc_matrix vs csr_matrix an additional option and submit a PR!

Best wishes,
Axel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants