Running addDeviations but constrained within groups, rather than using a background across all groups. #583
Replies: 8 comments
-
Hi @markphillippebworth . Thanks for posting and using the Issue Template. I'm not sure I fully understand the use case so I'll ask some clarifying questions. First, the deviations are dependent only on the Perhaps some of my confusion is from your reference to "addDeviations" which isnt a function in ArchR. Maybe you meant to refer to something else? Can you clarify? I'm not sure why you need to create a new ArchRProject rather than just re-running the individual steps on the same ArchRProject. Sorry if I'm missing something. |
Beta Was this translation helpful? Give feedback.
-
Hi @rcorces, Thank you for responding so quickly! Yes, I am talking about addDeviationsMatrix(). I thought addDeviationsMatrix relied on addBgdPeaks(), and generated z scores as compared to match peaks across the entire project. If you run addDeviationsMatrix() on an entire PBMS dataset, then it'll pull background peaks across multiple cell types for comparison to each arrow, which also contains multiple cell types. Lots of variation. Now imagine that an ArchRProject is now subset to only a specific cell type, like NK cells. addBgdPeaks will now pull from peaks that are only accessibility in NK cells, and most variability in peak accessibility (aside from stochasticity) should be diseased vs control status. Furthermore, each arrow will be either a diseased sample of NKs cells, or a healthy sample of NK cells. So we're effective calculated motif deviations of an arrow file with only diseased (or only healthy) NKs cells against a background set of peaks from healthy and disease NK cells. Does this make sense, or did I misunderstand something? |
Beta Was this translation helpful? Give feedback.
-
Thanks for clarifying. I understand now. Yes - you are correct that I give my 2 cents below but @jgranja24 is really the one who should answer this. I am not the most familiar with this part of the code but I think one way to do this would be to change your I'd be curious to know how much this affects the results at the end of the day. Have you checked? @jgranja24 - any thoughts? |
Beta Was this translation helpful? Give feedback.
-
@rcorces I haven't checked by running it manually. I'm working with patient data, so I'll need to see how significantly different background peaks are between patients, and how the motifs change. I'd appreciate @jgranja24 thoughts too. |
Beta Was this translation helpful? Give feedback.
-
I see what you are asking, but I think you may be a bit confused by how chromVAR works. The background peaks are GC-matched and average accessibility matched peaks. chromVAR "z-score" is independent for each cell (for a given cell it represents the (observed - expected) / expected accessibility). Therefore, I am not following how the variation will affect across samples. The only major thing that would affect chromVAR is the selection of peaks being used, but the biological result shouldnt change tbh. Maybe I am misunderstanding your question still, but I would just calculate them using all cells and doing your comparisons then. The variability ranking in chromVAR is simply the rowVars of the z-score matrix, so you can just subset cells post chromVAR analysis to do this ranking. I hope that helps! Please let us know if we are misinterpreting your goal! |
Beta Was this translation helpful? Give feedback.
-
@jgranja24 - Thank you for responding. I guess my goal is to change the background peakset to control for specific variables. Z-scores are calculated for each cell (independently from other cells), but they are still dependent on the background peakset, which is generated from the PeakMatrix, which is dependent on celltypes within a given ArchR project. When that background peakset has a broad set of peaks with many different motifs across multiple cell types, any given cell will have motif enrichement for celltype-specific, process-specific, or condition-specific TF usage. If I limit that background peakset to only regions common to NKs cell processes, then even if you choose many GC-matched peaksets, they will contain motifs important to NK cells (if enough peaks are sampled). In otherwords, If the background peakset is drawn from a subset of peaks common to NK cells (after GC-matching), then would that remove biology related to background NK processes? The expected accessibility for a given NK-celltype motif would be close to the observed motif accessibility in each NK cell when given an NK-cell specific background. In contrast, a TF motif relevant to an NK's response to disease would be significantly enriched in a NK cell responding to disease when compared to a background set of peaks from NK cells in general (or a peakset taken from disease and healthy conditions). My current understanding is that the background peaksets are calculated using the PeakMatrix, which is dependent on celltypes in the project. Please let me know if I have a leap in logic here. You've spent more time with ChromVar than I. |
Beta Was this translation helpful? Give feedback.
-
I think your logic is fine but the argument that Jeff is making (and I also raised) is that the difference in background peaks is unlikely to change your result. |
Beta Was this translation helpful? Give feedback.
-
Ok. Will do. |
Beta Was this translation helpful? Give feedback.
-
Do not use this form to report a bug in ArchR! Instead, use the "Bug report" option.
PLEASE FILL OUT THE RELEVANT INFORMATION AND DELETE THE UNUSED PORTIONS OF THIS ISSUE TEMPLATE.
Describe the problem that your feature request would address.
It would be great to be able to run addDeviations in a group-wise fashion. For example, in a pair-wise manner for treated vs untreated mouse line, or across longitidunal data from one individual. Right now, I'd have to subset each grouping I want to do, which would mean completely recopying my ArchRProject. That takes incredible amount of hard drive space, and is very slow, computationally.
Describe the solution you'd like
The alternative would be to give it a metadata column with groupings for constrained. For each group, addBgdPeaks would be run, and the deviations calculated for each arrow within the group using group-matched BgdPeaks. This would let us leverage experimental design to control for individual variation in motifs, and let us see only longitidinal or treatment effect on motif usage. Essentially, I get to normalize by individual this way.
Describe alternatives you've considered
I'm going to have to create a new ArchRProject for each individual, and run addDeviations, and export the motif matrix. Then combine them for visualization. This is going to take forever to copy because I'm working with over 30 individuals, and 3 time points per individual.
Additional context
Beta Was this translation helpful? Give feedback.
All reactions