Skip to content
Kamil S Jaroň edited this page May 14, 2019 · 5 revisions

Parameters L and U are determining lower and upper thresholds for coverage of kmers that will be considered as genomic kmers. Some approximate estimates can be make with smudgeplot cutoff function, but there is nothing wrong in eyeballing it directly from kmer spectra (and very often it does give a better estimate).

L

as high as you can but safe not to cut off your haploid kmers.

U

perhaps less important than L, you might want to exclude super repetitive kmers (like mt DNA or kmers from cetnro/telomeres) from your analysis. These kmers have usually enormous coverage, so U can go up to several thousands without a bit problem.

I am actually considering removing this argument and explore if ulrarepetitive kmers would actually represent a problem (we thought that they might so we have kicked them out, but we actually never checked).

TODO add a couple of examples of kmer spectra with appropriate L and U