resolve GPU tokenize padding issue #13

WeihaoGe1009 · 2024-08-31T21:37:54Z

Currently, when running the code on a GPU cluster, it doesn't recognize padding.

I got an error of:

Asking to pad, but the tokenizer does not have a padding token. 
Please select a token to use as 'pad_token'
'(tokenizer.pad_token = tokenizer.eos_token e.g.)' 
or add a new pad token via 'tokenizer.add_special_tokens({'pad_token': '[PAD]'})'.

This issue doesn't occur when I was running on the CPU-only cluster.

On GPU the following codes got ignored:

tokenizer.pad_token = tokenizer.eos_token

or

if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})
    model.resize_token_embeddings(len(tokenizer))

The text was updated successfully, but these errors were encountered:

WeihaoGe1009 · 2024-09-03T18:45:53Z

try without padding

WeihaoGe1009 added the bug Something isn't working label Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resolve GPU tokenize padding issue #13

resolve GPU tokenize padding issue #13

WeihaoGe1009 commented Aug 31, 2024

WeihaoGe1009 commented Sep 3, 2024 •

edited

Loading

resolve GPU tokenize padding issue #13

resolve GPU tokenize padding issue #13

Comments

WeihaoGe1009 commented Aug 31, 2024

WeihaoGe1009 commented Sep 3, 2024 • edited Loading

WeihaoGe1009 commented Sep 3, 2024 •

edited

Loading