Creation of the `decoder_attention_mask` while evaluating #32

pietrolesci · 2023-11-01T13:40:51Z

Hi there,

I am trying to recreate the decoder attention mask and I am a bit puzzled by how it is created here

Line 53 in 114dece

decoder_attention_mask = (decoder_input_ids == decoder_input_ids).float()

This creates a dense matrix with 1s everywhere. Shouldn't this be a lower triangular matrix (which is what T5Model does internally by default)?

Thanks a lot for your help!

dptam · 2023-11-21T14:41:09Z

The decoder_attention_mask is of the same shape as the input_ids [batch_size, seq_len] to determine which ids are pad_tokens are which are not. The lower triangular matrix is formed later in HuggingFace code of shape [batch_size, seq_len, seq_len].
The log probs of the padded_tokens get masked out later when computing the log prob of the choices, so it doesn't matter if we mask out the pad_tokens in the decoder_attention_mask.

pietrolesci mentioned this issue Nov 2, 2023

Remove redundant code from T5 encoder mask creation huggingface/transformers#27216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creation of the `decoder_attention_mask` while evaluating #32

Creation of the `decoder_attention_mask` while evaluating #32

pietrolesci commented Nov 1, 2023 •

edited

Loading

dptam commented Nov 21, 2023

Creation of the decoder_attention_mask while evaluating #32

Creation of the decoder_attention_mask while evaluating #32

Comments

pietrolesci commented Nov 1, 2023 • edited Loading

dptam commented Nov 21, 2023

Creation of the `decoder_attention_mask` while evaluating #32

Creation of the `decoder_attention_mask` while evaluating #32

pietrolesci commented Nov 1, 2023 •

edited

Loading