Replies: 13 comments 3 replies
-
Hi @rjb67, There is an R interface you could try (example script here: https://github.com/brianhie/scanorama/blob/master/bin/R/scanorama.R). Just keep in mind that Scanorama uses cell-by-feature matrices whereas I think most R pipelines do feature-by-cell matrices. Once you get a I'll leave this issue open in case others have had success with Scanorama and Seurat integration. |
Beta Was this translation helpful? Give feedback.
-
I have some problems when I use scanorama through the reticulate package in R. It seems that check_datasets() function cannot correctly work with "matrix" from R. Installation from pip: My code is :
Run it and then make an error:
How can I make R "matrix" compatible with scanorama? |
Beta Was this translation helpful? Give feedback.
-
Hi @zh542370159, sorry I'm not really sure about how reticulate and R treat the data types you are passing in. The script at https://github.com/brianhie/scanorama/blob/master/bin/R/scanorama.R should work with R version 3.5.1 and reticulate version 1.10. That script should help you debug your script. |
Beta Was this translation helpful? Give feedback.
-
Thank you. It's my fault. reticulate package cannot process a list containing the named matrix when passing them to the python. In that case, Python will only get the names instead of the matrix in the list. |
Beta Was this translation helpful? Give feedback.
-
@zh542370159 |
Beta Was this translation helpful? Give feedback.
-
@Thegreatjoyce Hi, going from Three things are important:
So when you want to create your list of assays to pass to assaylist <- list()
genelist <- list()
for(i in 1:length(seuratobjetclist))
{
assaylist[[i]] <- t(as.matrix(GetAssayData(seuratobjectlist[[i]], "data")))
genelist[[i]] <- rownames(seuratobjetclist[[i]])
} At this point you can run the integrated.data <- scanorama$integrate(assaylist, genelist)
corrected.data <- scanorama$correct(assaylist, genelist, return_dense=TRUE)
integrated.corrected.data <- scanorama$correct(assaylist, genelist, return_dimred=TRUE, return_dense=TRUE) Then, to go back to intdata <- lapply(integrated.corrected.data[[2]], t)
panorama <- do.call(cbind, intdata)
rownames(panorama) <- as.character(integrated.corrected.data[[3]])
colnames(panorama) <- unlist(sapply(assaylist, rownames))
intdimred <- do.call(rbind, integrated.corrected.data[[1]])
colnames(intdimred) <- paste0("PC_", 1:100)
#We also add standard deviations in order to draw Elbow Plots in Seurat
stdevs <- apply(intdimred, MARGIN = 2, FUN = sd) Creating Seurat object is trivial at this point, and we can skip the normalization and variable feature selection as we already have our PCA embeddings from pan.seurat <- CreateSeuratObject(counts = panorama, assay = "pano", project = "yourproject")
#Adding metadata from all previous objects
pan.seurat@meta.data <- do.call(rbind, lapply(seuratobjectlist, function(x) x@meta.data))
# VERY IMPORTANT: make sure that the rownames of your metadata slot
# are the same as the colnames of your integrated expression matrix
rownames(pan.seurat@meta.data) <- colnames(pan.seurat)
rownames(intdimred) <- colnames(pan.seurat)
pan.seurat[["pca"]] <- CreateDimReducObject(embeddings = intdimred, stdev = stdevs, key = "PC_", assay = "pano") At this point you can go forward with the usual workflow (don't keep the parameters I write, just use anything you see fit) pan.seurat <- FindNeighbors(pan.seurat, dims = 1:10)
FindClusters(pan.seurat)
FindAllMarkers(pan.seurat) HTH |
Beta Was this translation helpful? Give feedback.
-
@gdagstn Thanks for the detailed walkthrough between scanorama and seurat! In your code you used the "data" slot from the seurat objects to perform scanorama correction, and later used the corrected data as UMI counts to initiate a new seurat object. Since the "data" slot contains the log-normalized data, would it be more reasonable to use the "counts" slot instead to perform scanorama correction? After creating a new seurat object with corrected counts, is it legit to perform |
Beta Was this translation helpful? Give feedback.
-
Hi @xhbkirby, I think that @brianhie is obviously the most suited to answer your questions. However, I can give you my 1.5 cents. Brian has answered in some issues (see #54 and #68) that all the usual preprocessing steps can (and should) be carried out before integration. In the context of your specific question, this makes sense since you want to remove uninteresting sources of variation before the joint space is learned by Now, if instead you want to use the corrected counts for differential expression, I think you would have to find a test that suits the distribution of those normalized, corrected counts, and I have not made any serious attempt myself. As an alternative that may be more robust to batches I would consider looking at coexpression, which is implemented in @brianhie's HTH |
Beta Was this translation helpful? Give feedback.
-
Hi @gdagstn, Thanks for your quick reply and thoughts on differential expression. In fact, I saw in the Seurat forum there're also a lot of discussion about whether to use integrated/corrected data or original count data to perform DE, and the Seurat developers recommend the latter. Regarding the first part of your response, am I understanding correctly that you recommend removing unwanted variances (regression) on raw counts or log-transformed data, and then use the "regressed" data as input for Please bear me if any of my questions don't make much sense. I'm just not sure if the scanorama-corrected (and I assume also l2-normalized) output data can be log-transformed and centered/scaled again following the Seurat pipeline. |
Beta Was this translation helpful? Give feedback.
-
I think you can and should "regress out" unwanted sources of variation prior to integration. You would normally do that after computing HVGs for PCA in a normal workflow (i.e. without integration) and you do that, or something similar, when using other integration/batch correction methods. As I mentioned earlier the reasoning is that you don't want spurious populations to appear as a consequence of retaining uninteresting sources of variation. The reply on issue 68 does not explicitly say to not use scaled/regressed data, and I personally don't see why it would be an issue to do so, but I may be wrong. Going the other way around, i.e. your suggestion of correcting raw counts without integration and then applying log normalization on top of l2 normalization and then scaling the corrected data with regression seems like a suboptimal choice, because what "regressing out" means in Seurat is getting the residuals of a linear (or poisson or nb or...) model fit to each gene. So this means that an assumption on the distribution is made, and it will most likely be violated by corrected, L2 normalized data. I have not tried this myself though, so again I may be wrong. If you want you can actually implement l2 normalization yourself in R and play with the pbmc3k dataset to see what happens when you apply different normalisations in different orders:
edit to be honest it's not entirely clear to me why one would calculate HVGs before scaling the data, as the sources of variation will change after regression. It's true that scaling HVGs only saves time and memory at the time of computing the PCA, but still. I saw there's a github issue on this specific topic on which you also commented, so I'll be interested in the answer as well. |
Beta Was this translation helpful? Give feedback.
-
Hi @gdagstn, Thanks for the clarification! I agree that the second workflow will most likely change the underlying distribution of the data and might make regression in Seurat perform not the way as it should be. I'll try scanorama correction with the scaled/regressed data and see how that works. |
Beta Was this translation helpful? Give feedback.
-
Hi @gdagstn, I am following your code, however I am getting an error: It looks like your datasets are matrices from what I can see, but am I missing something? I am getting my datasets variable as follows:
Best wishes, |
Beta Was this translation helpful? Give feedback.
-
Hello @gdagstn! I am trying to use scanorama in R with my own data and it is being a problem. Everything goes well but in the last line of code I get this: Here it is the code I am using, that it is basicallly the one you posted before. I used two seurat objects I am reading individually (M003 and M108607). Thanks! library(purrr) extract_matrix <- function(seurat_object, results_list <- list() ifnb.list <- lapply(X = results_list, FUN = SCTransform) genelist = lapply(ifnb.list, VariableFeatures) datasets <- map2(.x = ifnb.list, .y = genelist, .f = extract_matrix) integrated.data <- scanorama$integrate(datasets, genelist) |
Beta Was this translation helpful? Give feedback.
-
I am interested in trying out Scanorama to see how it fares with our data, but I have the issue that we have been working with our data and doing our processing and analysis in Seurat in R. Are there any plans to integrate Scanorama in a way that makes it easily usable with Seurat? Or is this already possible and I have not been able to figure it out?
Beta Was this translation helpful? Give feedback.
All reactions