You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The measure of fit produced by this pipeline should be corrected for factors that can introduce random scores.
It should be corrected for the chromosome size. Naturally a bigger chromosome will represent a larger portion of a superscaffold purely by chance.
This could be of significance in genomes where there is a big difference between the largest and smallest chromosomes.
Additionally this correction should be based on the length of the chromosome minus the amount of N nucleotides in the chromosome since these are not aligned.
The text was updated successfully, but these errors were encountered:
One possible way of fixing this is by establishing the background measure of fit by taking all fits but the best, averaging this and correcting for the reference size.
Example of how this is an issue and how it should be corrected for.
6 references, 1-5 are 5mbp and 6 is 10mbp
Say reference 1 scored 40%, references 2-5 scored around 20% and reference 6 scored just under 40%. Reference 1 scores the best but since reference 6 is also very high analysis says there is no certainty that reference 1 is the right assignment.
Now we correct for size: %/(currentREF/smallestREF)=corrected%
Reference 1 becomes 40%/(5mbp/5mpb) = 40%
References 2-5 become 20%/(5/5)= 20%
Reference 6 becomes 40%/(10/5) = 20%
Now analysis would say reference 1 is significantly different and would assign the query to reference 1.
The measure of fit produced by this pipeline should be corrected for factors that can introduce random scores.
It should be corrected for the chromosome size. Naturally a bigger chromosome will represent a larger portion of a superscaffold purely by chance.
This could be of significance in genomes where there is a big difference between the largest and smallest chromosomes.
Additionally this correction should be based on the length of the chromosome minus the amount of N nucleotides in the chromosome since these are not aligned.
The text was updated successfully, but these errors were encountered: