-
Notifications
You must be signed in to change notification settings - Fork 0
1. Differential abundance with kmdiff
Kmdiff is a tool that finds k-mers in differential abundance between two conditions (lets say case and control). It works in two steps, counting k-mers and applying the statistical test.
K-mers are words of length k
extracted from a text by sliding a window along consecutive positions in a text. See the figure:
Read the following publications to understand kmdiff
:
The hypothesis is that k-mers' counts follow a Poisson law distribution of parameter theta. The theta parameter is supposed to be the same in both conditions:
Testing if the parameters are not the same is a special case of likelyhood ratio given by the following formula:
The value Lambda follows a Chi square distribution with one degree of freedom. We can thus apply a test and obtain a p-value. This test is applied to each k-mer with a correction for multiple test (Benjamini-Hochberg).
Two files are produced: case_kmers.fasta
and control_kmers.fasta
. They contain k-mers enriched in the case condition or control condition.