Doubt about security of scvelo results. #393
-
Hi! I've been using scvelo recently for some analysis, and I've come across some results that we striking to me. My data are some cells at days 0, 2, and 4, and I've run the analysis as in the mock data from this notebook The point is that when I did the scatter of several genes (marker genes selected within the heatmap) against the latent time, I realized that the Ms values were the ones plotted by default in the scatter plot, and were are far off from the expression values, in X, spliced, and unspliced layers. In general, these many genes are expressed in few cells whereas the Ms values are not zero in most of the cells. Despite that, I see a slight correlation between the Ms and expression values. My doubt is to which extent can I believe the predicted Ms values as a proxy for a gene that is up/downregulated across days. This dataset only has ~150 cells and, if the expressed cells are ~20 (although most of the cells show positive Ms values), can I believe the analysis to infer biological hypotheses? I tried to read your paper and I have doubts about how the Ms is recovered. Thanks for your help! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
First order moments are defined as the mean expression of the k nearest neighbors of a cell. Not sure what you mean by "predicted Ms" or "recovering" them as moments are not a real physical variable you can measure (and therefore not recover or predict). The moments are used to smoothen measurements and reduce the sparsity of the data. This step is crucial for even having a chance to observe splicing dynamics. Sorry, not sure what you are plotting in the given figures and why it is helpful. Also, what do you mean by "expressed cells"? |
Beta Was this translation helpful? Give feedback.
-
Hi! Thanks for the help. I understand the Ms now. What I show in the pictures is the scatter plot of each of the variables produced by scvelo, because I didn't know exactly what was being plotted by the scatter function with the default settings (first subplot). By "expressed cells" I mean cells with expression > 0. The point is, although I get that using kNN expression is necessary, I have doubts about to what extent can I directly believe the analysis because with my data I find examples where the gene expression is extremely low and, nonetheless, the Ms values seem more "realistic". For example, some top_genes in my analysis are IGF1R, BMP2 and DPP4: You can see that, for example, BMP2 is only expressed in 3 cells out of the 122, but due to the kNN, it looks like many more cells are expressed and, coincidentally, there's no expression at day 0. It is true that, for some genes with more expression, the smoothing makes the Ms scatter more appealing, and it is more trustworthy, but many of these low-expression genes are not filtered and can look like artifacts IMO. If I want to retain fewer genes but with more expression and whose Ms I can trust, what should I do? Should I reduce k? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Okay, so there are only ~20 cells containing expressed genes? Not sure if I understood correctly, but cells containing only zero counts should be removed during quality control. |
Beta Was this translation helpful? Give feedback.
First order moments are defined as the mean expression of the k nearest neighbors of a cell. Not sure what you mean by "predicted Ms" or "recovering" them as moments are not a real physical variable you can measure (and therefore not recover or predict). The moments are used to smoothen measurements and reduce the sparsity of the data. This step is crucial for even having a chance to observe splicing dynamics.
Sorry, not sure what you are plotting in the given figures and why it is helpful. Also, what do you mean by "expressed cells"?