Data preparation help needed : AnnData.zarr file loading time too long #219

gauravgadhvi · 2023-01-16T19:58:32Z

gauravgadhvi
Jan 16, 2023

Hi,
I am writing this to get some help about the AnnData.zarr formatted dataset and why it wouldn't load in time on the vitessce browser. I have tried creating an AnnData.zarr file with 1000 cells and 120000 cells and in both cases the browser starts loading and never moves forward from processing stage. I can see the correct number of cells and UMAP structure loaded in the background but it never stops showing the loading circle. I don't think scalability is the issue but is there a way I can troubleshoot this? I also used the optimize_adata() function prior to dumping the anndata object to zarr.
Below is the example I am trying to test :

http://vitessce.io/#?edit=false&url=https%3A%2F%2Frobbinsa.me%2Fcelldata%2Fwelchlab%2FpyTest%2FHY_optimized_zarrConfig.json

Any help or direction is highly appreciated. Thank you!

Best,
Gaurav Gadhvi

keller-mark · 2023-01-16T20:30:41Z

keller-mark
Jan 16, 2023
Maintainer

I think it is because the matrix is being formatted as CSR (https://robbinsa.me/celldata/welchlab/HY_allMerged_AnnData_optimized.zarr/X/.zattrs). Vitessce can load CSC more efficiently than CSR sparse matrices.

I just deployed a new version of the Python package (3.0.0) to PyPI (https://pypi.org/project/vitessce/#history).

pip uninstall vitessce
pip install vitessce[all]==3.0.0

In this version optimize_adata converts to CSC when necessary. You could also do it manually:

from scipy.sparse import issparse
if issparse(adata.X):
    adata.X = adata.X.tocsc()

2 replies

keller-mark Jan 16, 2023
Maintainer

Alternatively you could convert the X matrix to a dense matrix. For example, you can use the to_dense_X=True parameter of optimize_adata https://vitessce.github.io/vitessce-python/api_data.html#vitessce.data_utils.anndata.optimize_adata

gauravgadhvi Jan 17, 2023
Author

Thank you, Mark, for getting back. I see now why the zarr wouldn't load even after optimization, I will try your suggestion and convert it to a CSC matrix.
Although, I am a little confused about how would it work with a dense matrix. Is that something vitessce allows to load from?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data preparation help needed : AnnData.zarr file loading time too long #219

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Data preparation help needed : AnnData.zarr file loading time too long #219

gauravgadhvi Jan 16, 2023

Replies: 1 comment · 2 replies

keller-mark Jan 16, 2023 Maintainer

keller-mark Jan 16, 2023 Maintainer

gauravgadhvi Jan 17, 2023 Author

gauravgadhvi
Jan 16, 2023

Replies: 1 comment 2 replies

keller-mark
Jan 16, 2023
Maintainer

keller-mark Jan 16, 2023
Maintainer

gauravgadhvi Jan 17, 2023
Author