chosing L and U

Parameters L and U are determining lower and upper thresholds for coverage of kmers that will be considered as genomic kmers. Some approximate estimates can be make with smudgeplot cutoff function, but there is nothing wrong in eyeballing it directly from kmer spectra (and very often it does give a better estimate).

L

as high as you can but safe not to cut off your haploid kmers.

U

perhaps less important than L, you might want to exclude super repetitive kmers (like mt DNA or kmers from cetnro/telomeres) from your analysis. These kmers have usually enormous coverage, so U can go up to several thousands without a bit problem.

I am actually considering removing this argument and explore if ulrarepetitive kmers would actually represent a problem (we thought that they might so we have kicked them out, but we actually never checked).

TODO add a couple of examples of kmer spectra with appropriate L and U

Smudgeplot components

Tutorials

Some potentially useful details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chosing L and U

L

U

Clone this wiki locally