You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
Congratulations on this great work and thank you for putting up such an easy to use framework! It definitely facilitates research quite a bit :)
I was trying to interpret the scores logged during the evaluation of the development set and I realized that sometimes when summing the scores of the exponentiated negative of the scores for GT and CAND results in a sum bigger than 1 for two class datasets (like RTE). Maybe I'm interpreting these scores wrongly since I was expecting the sum of the scores (after converting them to probability space (that is np.exp(-1 * logprob)) to be less than or equal to 1 for two class datasets.
Would you let me know if my rationale is flawed and if or why the sum of the probabilities may be above 1?
Thank you in advance!
The text was updated successfully, but these errors were encountered:
Like we discussed in #16 , those numbers are the logits. And they are also averaged over all the examples, I printed it out to get some sense of what are the choice scores like. If you want to know the probability the model assign to each answer choice, you can save the choice_scores tensor before pred_score, prediction = choices_scores.min(dim=1) in the predict(self, batch) function of EncoderDecoder.py. And then multiply by -1 and do softmax with dim=1.
(JIC, don't forget to .detach() and .cpu() )
Thank you so much for your quick reply! I just asked because in some cases I detected the obtained probability was slightly above 1 (when naively using the torch.exp(-score)). I figured it could be because we used half-precision (16 bits). I'll use softmax instead! :)
Hi!
Congratulations on this great work and thank you for putting up such an easy to use framework! It definitely facilitates research quite a bit :)
I was trying to interpret the scores logged during the evaluation of the development set and I realized that sometimes when summing the scores of the exponentiated negative of the scores for
GT
andCAND
results in a sum bigger than 1 for two class datasets (like RTE). Maybe I'm interpreting these scores wrongly since I was expecting the sum of the scores (after converting them to probability space (that isnp.exp(-1 * logprob)
) to be less than or equal to 1 for two class datasets.Would you let me know if my rationale is flawed and if or why the sum of the probabilities may be above 1?
Thank you in advance!
The text was updated successfully, but these errors were encountered: