-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match Scores #12
Comments
Section 2 of the IEEE Transactions paper gives a good overview of this, so I recommend you check it out (let me know if you can't get the pdf). I guess there are roughly two answers to this:
Hope this helps! Let me know if there is something still unclear. |
I guess I have the same question as @imk1, I wonder what the match score in the 5th field is about. I am looking to use it for filtering, so I would need to know if larger the value the better, or vice versa, and what positives and negatives mean. Thanks! |
I've updated the wiki to describe the scoring model in more detail. I recommend going over it to see all the details. In short, the score itself can be basically viewed as the log of the ratio between probability of the model (PWM) generating the pattern and the probability of the background distribution generating the pattern. So bigger number is better. Note that for MOODS it's generally better to fix the threshold beforehand, as this makes scanning faster. |
Thanks for explaining more, and for the quick response! :-D |
Hi, is there a threshold that could be set? I mean, what score could be called as significant? |
Hi @jhkorhonen! :) Thanks for updating the scoring documentation. I'm still confused on one point: How should I interpret a negative score? I had thought MOODS would only report positive scores -- cases where the PWM-model was more likely than the null. I'm using a p-value cutoff of 0.0002 |
Hi, I'm also running into negative scores. Bug? test.fas:
test.matrix:
cmd: out_moods.txt: |
Basically the way p-value thresholds are defined is that if you set the p-value threshold to p, the actual concrete score threshold T is set, roughly speaking, so that the probability that the background distribution generates a sequence with score at least T is p. In other words, you can think about p-value threshold p giving you the top-scoring p fraction of all positions. While this seems to be a common approach in the literature, it does indeed work a bit weirdly with the log-likehood scoring, which has its own probabilistic interpretation. If your matrix is is very specific, then the top p fraction of best hits will include hits that are better explained by the background distribution, as the probability of the background generating a sequence with positive log-likelihood score is very small. Alternatively, one can simply set an absolute threshold T with the log-likelihood scores; if you want hits where the probability that the PSSM generated the sequence is at least C times the probability that the background generated the sequence, set T = log C (note that by default MOODS uses natural logarithm.) The complication is that there is generally no guarantee that a given PSSM can give scores as high as T, e.g. if the PSSM distribution is close to the background. (See also https://github.com/jhkorhonen/MOODS/wiki/Brief-theoretical-introduction) |
Got it. Thanks for this detailed explanation. |
What is the exact definition of the match score? If I somehow missed it in the paper, it would be great if you could direct me to it. Thanks!
The text was updated successfully, but these errors were encountered: