Using Scanpy's UMAP (calculated before merging) for adding trajectories in scvelo #1086
Unanswered
chris-31337
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Dear @WeilerP and the Scvelo team,
Based on your previous comments [1], I keep revisiting the question if it is "allowed" to plot the scvelo (0.2.5) trajectories onto a UMAP calculated from a previous (Scanpy 1.9.2) preprocessing pipeline performed on the same dataset (which was run before merging with the spliced/unspliced layers).
I understand that raw counts should be supplied in .X to scvelo for reliable results [2,3]. But I am not sure to which extent I can re-use the UMAP calculated in Scanpy (before merging with spliced/unspliced loom data) for scvelo. Reusing it would be advantageous because it would allow plotting the trajectory data onto a familiar structure used for other figures and previous analyses (letting aside for the moment that low-dimensional display of data has its own limitations and should not be overinterpreted).
In [1] you have written in response to a similar topic (reusing UMAP from Scanpy):
I would take this as "do NOT use UMAP calculated before merging, even from the same dataset, because the filtering and neighbor graph would be different", but this is in contrast to the analysis templates used by many people I know and also some online tutorials [4,5]. Even the 10x tutorial [6] appears to use UMAP calculated elsewhere (from its proprietary loupe browser) before merging and then overlaying the velocity results onto that imported UMAP, without rerunning UMAP calculation after filtering. Finally, importing UMAPs from Seurat seems to be possible as well [7], implying that it is not impossible to reuse UMAPs calculated elsewhere with trajectory data inferred later.
Unfortunately the scvelo tutorial [3] skips the UMAP step and states that "[the tutorial data] has an already pre-computed UMAP embedding". Hence it remains unclear if that pre-computed UMAP may also be derived from an independent pre-processing pipeline applied to that dataset (before merging with the spliced/unspliced layers) or HAS to be (re-)generated based on the merged and scvelo-filtered dataset.
Are all of the approaches involving preprocessed UMAPs wrong? Or am I overinterpreting/misunderstanding your statements in [1].
What is the recommended way of proceeding?
So far, I think I have the following options:
1. Using Scanpy's preprocessing entirely and normalizing only the splicing layers after merging
2. Reverting to raw counts in .X for scvelo but otherwise using Scanpy's UMAP
=> This approach generates a highly similar result to the first approach.
3. Reverting to raw counts in .X and redoing PCA and UMAP after merging
=> This approach generates a very different UMAP, which is hard to interpret when comparing to the original scanpy analysis. However, the velocity arrows seem to generally point in the same relative direction with respect to the old leiden clusters.
4. Redoing the entire analysis pipeline on untouched raw datasets, freshly merged using scvelo
=> This would mean repeating all steps around scrublet, QC cleanup of high mitochondrial and low ribosomal counts etc. with the merged dataset, as if all previous analysis had never happened.
Which of these four ways (if any) should be considered acceptable?
P.S.: In reference to [8], please note that my preprocessing DID determine highly variable genes but did NOT subset the dataset to HVG-only, since sc.tl.pca() defaults to using HVG anyway when
.var['highly_variable']
is set [9]. Hence, the Scanpy_result.h5ad mentioned above contains the "full" dataset (minus removed duplicates and putatively dead cells).P.P.S: Briefly, the preprocessing steps were as follows:
References
[1] #775
[2] https://www.sc-best-practices.org/trajectories/rna_velocity.html
[3] https://scvelo.readthedocs.io/en/stable/VelocityBasics/
[4] https://smorabit.github.io/tutorials/8_velocyto/
[5] https://youtu.be/AUiYxtGJYtg?t=677
[6] https://www.10xgenomics.com/resources/analysis-guides/trajectory-analysis-using-10x-Genomics-single-cell-gene-expression-data
[7] #192
[8] #755
[9] https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.pca.html
Beta Was this translation helpful? Give feedback.
All reactions