-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is scaleDims scaling along the wrong axis? #2120
Comments
Hi @dannyconrad! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues. |
I didn't realize the Discussion section was separate and just noticed that someone brought up a similar concern at #1935, but it hasn't been answered yet. Since I consider this may be a "bug" even though it doesn't throw an error I'll leave this post in this section for now. |
For additional context, here's an example of the scaled vs non-scaled corToDepth vectors, which would be the same if the reduced embedding was getting scaled by component instead of by cell:
|
I read the issue threads #323 & #447 because I've been working a lot with the LSI components of my datasets lately. I think I found the source of the confusion that lead to those posts.
When scaleDims is set to T (the default), the rowZscores function is invoked which scales the LSI component values of each individual cell by row, i.e. using the mean and SD of each individual cell's N components. Obviously based on the name of the function this seems by design.
However I also saw in the documentation of getReducedDims() that the idea to do so was based on the scaling Tim Stuart introduced in Signac::RunSVD(). I dug into the code of RunSVD() and there the scaling is not done by row, but rather by column, i.e. scaling by the mean and SD of each component instead.
The relevant code within RunSVD():
As far as I can tell, in both cases the input matrix has components as columns and cells as rows.
This discrepancy is why the corToDepth vector of the scaled embeddings is so different and the scaled dimensions no longer really correlate with nFrags. Since the scaled values are used by default, LSI_1 is almost never filtered out by corCutOff even when it should be.
To reproduce/verify this, you can just check the ranked order of the values along each axis before and after the scaling is performed:
Because of the way it messes up the depth correlation and artificially rearranges the cells relative to one another in the lower dimensional space, I'm guessing this is not the correct way to scale the LSI dimensions, but maybe I'm wrong and this was done orthogonally to Signac on purpose? Or have I missed some key detail here?
The text was updated successfully, but these errors were encountered: