Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow precomputed baseline for single sample pipeline #124

Open
wudustan opened this issue Sep 30, 2024 · 4 comments
Open

Allow precomputed baseline for single sample pipeline #124

wudustan opened this issue Sep 30, 2024 · 4 comments

Comments

@wudustan
Copy link

For pipelineCNA() a synthetic baseline is calculated per-sample. This can be an issue when you have a large dataset and have to run single samples for computational reasons as the baseline is different for each sample. There should ideally be a way to generate a baseline for whole dataset and then allow that as the subtraction for each sample separately.

Relevant code:

if (length(norm_cell_names) < 1) {
    print("7) Measuring baselines (pure tumor - synthetic normal cells)")
    count_mtx_relat <- removeSyntheticBaseline(count_mtx, par_cores = par_cores)
  } else {
    print("7) Measuring baselines (confident normal cells)")
    if (length(norm_cell_names) == 1) {
      basel <- count_mtx[, which(colnames(count_mtx) %in% norm_cell_names)]
    }
    else {
      basel <- apply(count_mtx[, which(colnames(count_mtx) %in% norm_cell_names)], 1, median)
    }
    count_mtx_relat <- count_mtx - basel
  } 
@wudustan
Copy link
Author

Additionally, large datasets stall the script due to rasterisation of the heatmap

@AntonioDeFalco
Copy link
Owner

Hi @wudustan,
Why do you need to analyze a large dataset of multiple samples? The suggestion is to examine each sample at a time for more accurate results, only if you have several samples from the same patient could you analyze them together. The reasoning is like CNV analysis from bulk with the matched normal, versus an analysis with a Panel of Normal (PoN) created from multiple healthy tissue samples. You could use multiSampleComparisonClonalCN to compare multiple samples.

Thanks

@wudustan
Copy link
Author

wudustan commented Oct 7, 2024

Thanks for replying @AntonioDeFalco

I have a large experiment (>20 libraries: 4 timepoints +/- drug) where a cancer stem cell culture was treated with high dose drug over a long period of time to generate resistant cells. All the cells in the experiment are malignant and I want to get a subclonal analysis to see if specific subclones develop and persist over time, but due to the way pipelineCNA() works, the algorithm will find ~5 'normal' cells in the sample and will then give me a garbled subclone analysis as a result.

From a conceptual point of view, finding an artificial baseline from 100% tumour single-sample-wise will also be problematic since individual samples will have different amounts and types of CNA events. If I could pre-calculate a baseline from the whole dataset and then use that to do clonal analysis on all libraries separately, I would get a more consistent result.

I previously ran the analysis as single libraries, but looking at the heatmap the script generates - I can tell the clustering and clonal calling isn't correct, but because the pipeline is one giant wrapper script, it makes it hard to modify. I can't pass it a vector of normal cells for norm_cell because there aren't any.

@wudustan
Copy link
Author

@AntonioDeFalco do you have any advice for how to proceed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants