Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sum of logprobs in the probability space adds up to values above 1 #19

Closed
PastelBelem8 opened this issue Sep 15, 2022 · 2 comments
Closed

Comments

@PastelBelem8
Copy link

Hi!
Congratulations on this great work and thank you for putting up such an easy to use framework! It definitely facilitates research quite a bit :)

I was trying to interpret the scores logged during the evaluation of the development set and I realized that sometimes when summing the scores of the exponentiated negative of the scores for GT and CAND results in a sum bigger than 1 for two class datasets (like RTE). Maybe I'm interpreting these scores wrongly since I was expecting the sum of the scores (after converting them to probability space (that is np.exp(-1 * logprob)) to be less than or equal to 1 for two class datasets.

Would you let me know if my rationale is flawed and if or why the sum of the probabilities may be above 1?

Thank you in advance!

@HaokunLiu
Copy link
Collaborator

HaokunLiu commented Sep 19, 2022

Thank you! I'm glad you find this helpful.

Like we discussed in #16 , those numbers are the logits. And they are also averaged over all the examples, I printed it out to get some sense of what are the choice scores like. If you want to know the probability the model assign to each answer choice, you can save the choice_scores tensor before pred_score, prediction = choices_scores.min(dim=1) in the predict(self, batch) function of EncoderDecoder.py. And then multiply by -1 and do softmax with dim=1.
(JIC, don't forget to .detach() and .cpu() )

@PastelBelem8
Copy link
Author

Thank you so much for your quick reply! I just asked because in some cases I detected the obtained probability was slightly above 1 (when naively using the torch.exp(-score)). I figured it could be because we used half-precision (16 bits). I'll use softmax instead! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants