Add command line option for sample overlap to mutation prediction scripts #43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #33.
In our paper, we want to compare data types for mutation prediction in 3 different experiments: one using gene expression data only, one using gene expression and methylation, and one using expression/methylation/RPPA/mutational signatures data. In each of these cases, we only want to use TCGA samples that have data for all of these data types.
Before, I was handling this in a super hacky way, by commenting out lines in
mpmp/config.py
. This change creates a command line option to select the data types to use for calculating the sample overlap, which makes this step much more reproducible and understandable.It's a fairly straightforward change to the code, not too much to review.