No regulators identified? #55
-
Hi, I've run 10 gene sets through genewalk from an RNA-seq experiment (various treatment conditions with up- or down-regulated genes). While 9/10 gene sets have produced expected results, one particular gene set fails to identify any regulators, i.e. the scatterplot is empty with the exception of a few dots on the x-axis and the genewalk_regulators.csv is empty. Despite this, the barplot folder is populated with 688 figures, so its not clear to me if this is a true reflection of the gene set I've provided or some kind of error. I've attempted to re-run this analysis on a few different occasions by re-generating the source gene list file (thinking it was corrupted in some way maybe? Just a wild guess). Nothing has seemed to help. For reference, the analysis is being conducted on MacOS 11.2.1 with Python v3.8. The code I'm using for the analysis is below: $ genewalk --project UTD24_DOWN --genes UTD24_DOWN.txt --id_type ensembl_id --nproc 4 --nreps_graph 10 --nreps_null 10 I've also attached the output log file, results, scatter plot and regulators spreadsheet. genewalk_all.log genewalk_regulators.csv.zip |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
According to your genewalk_results.csv file, all the global_padj values are larger than 0.1. As described on our README page at input option You mentioned you have tried to rerun GeneWalk with the same gene set as input with no change in results. In that case I do think this really does mean that this gene set does not provide very strong statistically significant results. However there are some genes with global_padj = 0.101, so very slightly above 0.1 so you could rerun the visualization stage with a more lenient FDR threshold
Then you'll get at least some genes with relevant functions global_padj < alph_fdr = 0.15, but perhaps still no genes classified as "regulators", which require the fraction of relevant GO annotations > 0.5, i.e. more than half of a gene's GO annotations need to have global_padj < alph_fdr and gene_padj < alph_fdr. See here for that code https://github.com/churchmanlab/genewalk/blob/master/genewalk/plot.py#L80 If I may give some more advice: I would recommend running genewalk with UP and DOWN regulated genes together in one list. So better not to separate UP and DOWN genes (as you are probably used to from enrichment analysis) for different genewalk runs. The reason is that GeneWalk also makes use of reaction statements like gene1 inhibits gene2. So if the list only contains UP or DOWN regulated genes, these statements are then not included in the network if gene1 was up and gene2 was down whilst it is actually relevant information for the algorithm to do its job. So bottom line: use all your DE genes from a particular condition in one run. Then afterwards you can split the genewalk output yourself according to which genes were UP and DOWN. Best of luck with your further research. Robert |
Beta Was this translation helpful? Give feedback.
-
@ri23 Ah, okay thank you for clarifying that. I didn't realize that, unlike the barplots, it would fail to define regulators if they didn't reach significance. I assumed that many of the regulators in other quadrants on the scatterplot didn't reach significance so it never occurred to me to look back over that. As for the use of all DE genes in a list, can you explain in a little more detail what data defines the reaction statements? For example, is this based on ChIP data or something similar? And does it consider functional aspects of the regulation, such as up- or down-regulated? Relatedly, is there a way to input genes into the list such that they are annotated as up- or down-regulated? Thus far I have been inputting the gene list a single column of ensembl ids. Or is there an intuitive way to process them in this fashion after the genewalk analysis? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
@ri23 Great, thank you for the advice and the help! Much appreciated. |
Beta Was this translation helpful? Give feedback.
Hi @npokorzynski
According to your genewalk_results.csv file, all the global_padj values are larger than 0.1. As described on our README page at input option
--alpha_fdr
: by default the visualization stage uses a FDR threshold of 0.1. In this case, no regulators were identified because none of the genes have any relevant (global_padj < alpha_fdr = 0.1 by default) GO annotations.So yes the output you got is as expected, no error.
You mentioned you have tried to rerun GeneWalk with the same gene set as input with no change in results. In that case I do think this really does mean that this gene set does not provide very strong statistically significant results.
However there are some genes…