Adding DESC to data integration benchmarking #28

LuckyMD · 2020-07-24T12:05:33Z

We were thinking about adding DESC to our benchmark of data integration tools (https://github.com/theislab/scib). We would be running our own pre-processing for the input to DESC for this, which is reliant on Scanpy version 1.4.5+. Do you think it would be possible to use just the desc.train() function if we remove the Scanpy requirement and install via github? Would this also be okay for using Keras 2.2.4?

Also, to compare the methods properly we would not be able to use the clustering output you provide, but instead we would use the embedding at a default clustering resolution (resolution=0.8 as in your tutorial). Would this be a suitable way of evaluating DESC?

Kind regards,

The text was updated successfully, but these errors were encountered:

eleozzr · 2020-07-24T12:48:13Z

Because DESC's dependency on other packages, such as TensorFlow,scanpy and keras, we are updating and testing our desc algorithm to be compatible with tensorflow2.0, scanpy 1.4.5+. Hopefully, the latest algorithm can be uploaded into GitHub and PyPI tomorrow.

The single resolution (0.8 or 1.0) is ok for DESC.

Thanks.

LuckyMD · 2020-07-24T13:06:11Z

Hi @eleozzr,

That is great news! If you have it online by tomorrow, we will be able to add it to the benchmark on time :). I am looking forward to a setup.py with keras>2.1 and Scanpy>1.3.6 as dependencies :). I hope you will still be able to use tensorflow 1.X as well though, and that it's not exclusive to tensorflow 2.

eleozzr · 2020-07-25T07:34:48Z

Hi LuckyMD,

I have already updated our desc algorithm.

For tensorflow 1*, we released desc(2.0.3). Please see our jupyter notebook example desc_2.0.3_paul.ipynb
For tensorflow 2*, we released desc(2.1.1). Please see our jupyter notebook example desc_2.1.1_paul.ipynb

Hope this helps. Thanks.

LuckyMD · 2020-07-27T16:09:42Z

hi @eleozzr,

Thanks for the updates. However, I see that the scanpy version is still capped to 1.4.4 in version 2.0.3, and it seems this is removed in version 2.1.1 but interestingly the tensorflow version is capped at 2.0 there. Could it be that this is a typo? Would it be possible to remove the scanpy version cap for 2.0.3?

LuckyMD · 2020-07-27T16:13:56Z

Also, what is the limit to python 3.7 compatability?

eleozzr · 2020-07-28T02:56:49Z

Also, what is the limit to python 3.7 compatability?

All the script was tested in python3.5 or python3.6, but I think the script works in python3.7.

LuckyMD · 2020-07-28T07:44:20Z

Are you still planning to fix the install requirements in:

desc/setup.py

Lines 21 to 23 in 9acb047

    
           'tensorflow>=1.7,<2.0', 
        
           'keras==2.1',  
        
           'scanpy>=1.3.6,<1.4.4',

Then I could just install from pip directly rather than making a git fork and installing via that.

eleozzr · 2020-07-28T07:57:23Z

Are you still planning to fix the install requirements in:

desc/setup.py

Lines 21 to 23 in 9acb047

'tensorflow>=1.7,<2.0',

'keras==2.1',

'scanpy>=1.3.6,<1.4.4',

Then I could just install from pip directly rather than making a git fork and installing via that.

Could you install by pip install desc==2.1.1 or 2.0.3?
I haven't figured out why the code in Github is not the latest.
If you need version v2.0.3, maybe you can download it directly by https://drive.google.com/file/d/106xrwqnskG-Eu--Bv_hvc0CtKSG64Xh0/view

LuckyMD · 2020-07-28T09:28:33Z

I hadn't tried to install via pip yet, as the setup.py with the 2.0.3 tag still showed these dependencies. I will give it a go and report back.

LuckyMD · 2020-07-28T09:31:50Z

Install for 2.0.3 worked, thanks :). Is there a reason you are limited to keras 2.1? Just a question... I can work with that as well ;).

LuckyMD · 2020-07-28T11:09:43Z

Hi @eleozzr,

I have another question about using DESC. I can't see where you pass batch information to the algorithm so that it can perform data integration. Do you explicitly integrate data across batches or just produce a low-dimensional embedding that is less affected by batch?

If DESC doesn't explicitly do data integration, but only produces a low-dimensional embedding which is less affected by batch effects than the high-dimensional data, maybe we shouldn't be comparing it to other data integration tools? I guess that comparison might not be fair to DESC.

What do you think?

eleozzr · 2020-07-28T11:20:56Z

with

Install for 2.0.3 worked, thanks :). Is there a reason you are limited to keras 2.1? Just a question... I can work with that as well ;).

Sometimes, the version of TensorFlow and Keras needs to match. So I directly limit the version to avoid unnecessary issues due to the unmatch of Tensorflow and Keras.

eleozzr · 2020-07-28T11:23:34Z

Hi @eleozzr,

I have another question about using DESC. I can't see where you pass batch information to the algorithm so that it can perform data integration. Do you explicitly integrate data across batches or just produce a low-dimensional embedding that is less affected by batch?

If DESC doesn't explicitly do data integration, but only produces a low-dimensional embedding which is less affected by batch effects than the high-dimensional data, maybe we shouldn't be comparing it to other data integration tools? I guess that comparison might not be fair to DESC.

What do you think?

The only batch information for desc is the batch id. Generally speaking, you should scale data within each batch instead of scaling across all cells. Here is a simple example

sc.pp.filter_cells(adata,min_genes=200)
ac.pp.filter_genes(adata,min_cells=10)
adata.raw=adata.copy()
sc.pp.normalize_per_cell(adata,counts_per_cell_after=1e4)
sc.pp.highly_variable_genes(adata,n_top_genes=1000,subset=True, inplace=True)
sc.pp.log1p(adata)
sc.pp.scale(adata,zero_center=True,max_value=6)
#When your datasets have batch effect, you should try to scale data within each batch.(Here `Group` is the batch id information) 
#adata=desc.scale_bygroup(adata,groupby="Group",max_value=6)
#then you can feed adata into desc
adata=DESC.train(adata,
            dims=[adata.shape[1],128,32], # or set 256
            tol=0.001, #suggest 0.005 when the dataset less than 5000
            n_neighbors=10,
            batch_size=256,
            louvain_resolution=[0.8,1.0], # one value is also works
            save_dir="your_result_output_dir",
            do_tsne=True,
            use_GPU=False,
            num_Cores=1,
            save_encoder_weights=True,
            save_encoder_step=2,
            use_ae_weights=False,
            do_umap=False,
            num_Cores_tsne=4,
            learning_rate=500)

Hope this helps.

LuckyMD · 2020-07-28T13:33:03Z

Thanks for the example! The parametrization is slightly different to your example notebook. I will use something closer to this if you think this is a better default parametrization for datasets of 10k+ cells?

I will then add batch-specific scaling and then make a PR for the Benchmarking data integration repo here. Would it be okay if I tagged you in that PR so you could check that DESC is used as you think is correct?

eleozzr · 2020-07-28T15:20:06Z

Thanks for the example! The parametrization is slightly different to your example notebook. I will use something closer to this if you think this is a better default parametrization for datasets of 10k+ cells?

I will then add batch-specific scaling and then make a PR for the Benchmarking data integration repo here. Would it be okay if I tagged you in that PR so you could check that DESC is used as you think is correct?

If you only need the embedding of desc, you can set do_tsne=False and do_umap=False.

And it will be okay if you tagged me.

LuckyMD · 2020-07-28T15:21:54Z

If you only need the embedding of desc, you can set do_tsne=False and do_umap=False.

Yes, I've already changed this :). I will test the code, make a PR and then you can tell me if I'm doing something stupid ;).

LuckyMD · 2020-07-29T09:54:44Z

Thing seem to be running for me now, thanks! I just quickly wanted to highlight 2 things:

Installing DESC also requires pydot and GraphViz (not the python package, but the C binaries), which are not automatically installed with the keras dependency (this is a keras issue, but maybe worth highlighting somewhere).
Installing GraphViz via conda installed keras version 2.3.1 for me, but the code still ran through (I am still using tensorflow version 1.14). I assume that you don't really need the pinned version you have in setup.py.

eleozzr · 2020-07-30T01:21:34Z

2. setup

Thanks.

LuckyMD · 2020-07-30T13:36:31Z

It would be great if you could look over our PR here: theislab/scib#131

Thanks!

LuckyMD · 2020-08-05T10:31:10Z

Hi @eleozzr,
We are still having an issue with saving the network weights. I have turned off saving weights via save_encoder_weights=False, but then I set save_dir=tmp_dir as you suggest in the defaults. However, we have no permissions to save on a local directory in our server. Is there a way to turn off saving any files?

LuckyMD · 2020-09-11T11:47:24Z

It would be great to get an input on the above question of how to turn off saving the weights.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding DESC to data integration benchmarking #28

Adding DESC to data integration benchmarking #28

LuckyMD commented Jul 24, 2020

eleozzr commented Jul 24, 2020

LuckyMD commented Jul 24, 2020

eleozzr commented Jul 25, 2020

LuckyMD commented Jul 27, 2020 •

edited

Loading

LuckyMD commented Jul 27, 2020

eleozzr commented Jul 28, 2020

LuckyMD commented Jul 28, 2020

eleozzr commented Jul 28, 2020 •

edited

Loading

LuckyMD commented Jul 28, 2020

LuckyMD commented Jul 28, 2020

LuckyMD commented Jul 28, 2020

eleozzr commented Jul 28, 2020

eleozzr commented Jul 28, 2020 •

edited

Loading

LuckyMD commented Jul 28, 2020 •

edited

Loading

eleozzr commented Jul 28, 2020

LuckyMD commented Jul 28, 2020

LuckyMD commented Jul 29, 2020

eleozzr commented Jul 30, 2020

LuckyMD commented Jul 30, 2020

LuckyMD commented Aug 5, 2020

LuckyMD commented Sep 11, 2020

Adding DESC to data integration benchmarking #28

Adding DESC to data integration benchmarking #28

Comments

LuckyMD commented Jul 24, 2020

eleozzr commented Jul 24, 2020

LuckyMD commented Jul 24, 2020

eleozzr commented Jul 25, 2020

LuckyMD commented Jul 27, 2020 • edited Loading

LuckyMD commented Jul 27, 2020

eleozzr commented Jul 28, 2020

LuckyMD commented Jul 28, 2020

eleozzr commented Jul 28, 2020 • edited Loading

LuckyMD commented Jul 28, 2020

LuckyMD commented Jul 28, 2020

LuckyMD commented Jul 28, 2020

eleozzr commented Jul 28, 2020

eleozzr commented Jul 28, 2020 • edited Loading

LuckyMD commented Jul 28, 2020 • edited Loading

eleozzr commented Jul 28, 2020

LuckyMD commented Jul 28, 2020

LuckyMD commented Jul 29, 2020

eleozzr commented Jul 30, 2020

LuckyMD commented Jul 30, 2020

LuckyMD commented Aug 5, 2020

LuckyMD commented Sep 11, 2020

LuckyMD commented Jul 27, 2020 •

edited

Loading

eleozzr commented Jul 28, 2020 •

edited

Loading

eleozzr commented Jul 28, 2020 •

edited

Loading

LuckyMD commented Jul 28, 2020 •

edited

Loading