Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the meaning of score_gt and score_cand? #16

Open
CaffreyR opened this issue Jul 18, 2022 · 6 comments
Open

What is the meaning of score_gt and score_cand? #16

CaffreyR opened this issue Jul 18, 2022 · 6 comments

Comments

@CaffreyR
Copy link

What is the meaning of score_gt and score_cand? How do I better run the model by observing these parameters?

@CaffreyR
Copy link
Author

Btw what is the exact meaning of bs in your code? @muqeeth

        if not self.config.split_option_at_inference:
            bs, num_choices = choices_ids.size()[:2]
            flat_choices_ids = choices_ids.flatten(0, 1)
            attention_mask = (input_ids != self.tokenizer.pad_token_id).float()  # [bs, max_seq_len]
            encoder_hidden_states = self.model.encoder(input_ids=input_ids, attention_mask=attention_mask)[0]
            encoder_hidden_states = encoder_hidden_states.unsqueeze(dim=1).repeat(1, num_choices, 1, 1).flatten(0, 1)
            attention_mask = attention_mask.unsqueeze(dim=1).repeat(1, num_choices, 1).flatten(0, 1)
            decoder_input_ids = torch.cat([torch.zeros_like(flat_choices_ids[:, :1]), flat_choices_ids[:, :-1]], dim=1)
            decoder_attention_mask = (decoder_input_ids == decoder_input_ids).float()
            lm_target = flat_choices_ids - 100 * (flat_choices_ids == self.tokenizer.pad_token_id).long()

            model_output = self.model(
                attention_mask=attention_mask,
                encoder_outputs=[encoder_hidden_states],
                decoder_input_ids=decoder_input_ids,
                decoder_attention_mask=decoder_attention_mask,
            )
            choices_scores = (
                F.cross_entropy(model_output.logits.flatten(0, 1), lm_target.flatten(0, 1), reduction="none")
                .view(bs, num_choices, -1)
                .sum(dim=-1)
            )
            if self.config.length_norm > 0:
                choices_scores = choices_scores / torch.pow(
                    (choices_ids != self.tokenizer.pad_token_id).sum(dim=-1), self.config.length_norm
                )
            pred_score, prediction = choices_scores.min(dim=1)

     score_gt = choices_scores[range(bs), labels]
     choices_scores[range(bs), labels] = choices_scores.max(dim=-1)[0]
     score_cand = choices_scores.min(dim=-1)[0]

@muqeeth
Copy link
Collaborator

muqeeth commented Jul 19, 2022

bs here in the code is batch size I think.

@CaffreyR
Copy link
Author

But what does it mean for score_sand & gt @muqeeth

@HaokunLiu
Copy link
Collaborator

score_cand and score_gt means the average score for wrong answers and correct answers.

@PastelBelem8
Copy link

@HaokunLiu imagine I'd like to persist the scores as probabilities, is it safe to assume that torch.exp(score_gt) + torch.exp(score_cand) < 1?

@HaokunLiu
Copy link
Collaborator

@HaokunLiu imagine I'd like to persist the scores as probabilities, is it safe to assume that torch.exp(score_gt) + torch.exp(score_cand) < 1?

Ha, you found this issue. In fact, if we are going to compute a probability distribution over all the choices (including correct and incorrect), they should be considered as -logits rather than probabilities. They correspond to $ - \beta (x, y)$ from eq. 2 in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants