Skip to content
Isaac Turner edited this page Sep 6, 2016 · 5 revisions

Kmer size (k) is the only parameter of a de Bruijn graph. It specifies the number of bases stored in each node.

CGGATGGTGA with k=9:

CGGATGGTG -> GGATGGTGA

CGGATGGTGA with k=5:

CGGAT -> GGATG -> GATGG -> ATGGT -> TGGTG -> GGTGA

Choosing optimal kmer size must take in to account:

Kmer size affects the kmer coverage.

Higher kmer sizes make it easier to resolve repeats (low complexity regions) but harder to distinguish sequencing error. Larger genomes generally have higher repeat content therefore require a larger kmer size. A good kmer size is just over half a read length, which prevents sequencing errors from forming bubbles.

Clone this wiki locally