Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct measure of fit #8

Open
Wychor opened this issue Sep 18, 2019 · 1 comment
Open

Correct measure of fit #8

Wychor opened this issue Sep 18, 2019 · 1 comment

Comments

@Wychor
Copy link
Collaborator

Wychor commented Sep 18, 2019

The measure of fit produced by this pipeline should be corrected for factors that can introduce random scores.
It should be corrected for the chromosome size. Naturally a bigger chromosome will represent a larger portion of a superscaffold purely by chance.
This could be of significance in genomes where there is a big difference between the largest and smallest chromosomes.
Additionally this correction should be based on the length of the chromosome minus the amount of N nucleotides in the chromosome since these are not aligned.

@Wychor
Copy link
Collaborator Author

Wychor commented Sep 18, 2019

One possible way of fixing this is by establishing the background measure of fit by taking all fits but the best, averaging this and correcting for the reference size.
Example of how this is an issue and how it should be corrected for.
6 references, 1-5 are 5mbp and 6 is 10mbp
Say reference 1 scored 40%, references 2-5 scored around 20% and reference 6 scored just under 40%. Reference 1 scores the best but since reference 6 is also very high analysis says there is no certainty that reference 1 is the right assignment.
Now we correct for size: %/(currentREF/smallestREF)=corrected%
Reference 1 becomes 40%/(5mbp/5mpb) = 40%
References 2-5 become 20%/(5/5)= 20%
Reference 6 becomes 40%/(10/5) = 20%
Now analysis would say reference 1 is significantly different and would assign the query to reference 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant