-
Notifications
You must be signed in to change notification settings - Fork 25
kmer size
Isaac Turner edited this page Sep 6, 2016
·
5 revisions
Kmer size (k
) is the only parameter of a de Bruijn graph. It specifies the number of bases stored in each node.
CGGATGGTGA
with k=9
:
CGGATGGTG -> GGATGGTGA
CGGATGGTGA
with k=5
:
CGGAT -> GGATG -> GATGG -> ATGGT -> TGGTG -> GGTGA
Choosing optimal kmer size must take in to account:
- read length
- depth
- error rate
- genome complexity
Kmer size affects the kmer coverage.
Higher kmer sizes make it easier to resolve repeats (low complexity regions) but harder to distinguish sequencing error. Larger genomes generally have higher repeat content therefore require a larger kmer size. A good kmer size is just over half a read length, which prevents sequencing errors from forming bubbles.