Skip to content

1. Differential abundance with kmdiff

Louis-Mael Gueguen edited this page Sep 6, 2024 · 1 revision

Kmdiff is a tool that finds k-mers in differential abundance between two conditions (lets say case and control). It works in two steps, counting k-mers and applying the statistical test.

What is a k-mer ?

K-mers are words of length k extracted from a text by sliding a window along consecutive positions in a text. See the figure:


Counting k-mers

Read the following publications to understand kmdiff:

Statistical test

The hypothesis is that k-mers' counts follow a Poisson law distribution of parameter theta. The theta parameter is supposed to be the same in both conditions:


Testing if the parameters are not the same is a special case of likelyhood ratio given by the following formula:


The value Lambda follows a Chi square distribution with one degree of freedom. We can thus apply a test and obtain a p-value. This test is applied to each k-mer with a correction for multiple test (Benjamini-Hochberg).

Two files are produced: case_kmers.fasta and control_kmers.fasta. They contain k-mers enriched in the case condition or control condition.