You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The decoder_attention_mask is of the same shape as the input_ids [batch_size, seq_len] to determine which ids are pad_tokens are which are not. The lower triangular matrix is formed later in HuggingFace code of shape [batch_size, seq_len, seq_len].
The log probs of the padded_tokens get masked out later when computing the log prob of the choices, so it doesn't matter if we mask out the pad_tokens in the decoder_attention_mask.
Hi there,
I am trying to recreate the decoder attention mask and I am a bit puzzled by how it is created here
t-few/src/models/EncoderDecoder.py
Line 53 in 114dece
This creates a dense matrix with 1s everywhere. Shouldn't this be a lower triangular matrix (which is what T5Model does internally by default)?
Thanks a lot for your help!
The text was updated successfully, but these errors were encountered: