We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请问预训练中数据处理whole word mask 中这一行有什么作用,我发现如果去掉这行后效果会显著下降
roberta_zh/create_pretraining_data.py
Line 526 in 13f7849
The text was updated successfully, but these errors were encountered:
你好,我也有同样的疑问。请问你说的效果显著下降是指什么呢?是说预训练的模型推理精度会下降吗?
Sorry, something went wrong.
请问预训练中数据处理whole word mask 中这一行有什么作用,我发现如果去掉这行后效果会显著下降 roberta_zh/create_pretraining_data.py Line 526 in 13f7849 output_tokens = [t[2:] if len(re.findall('##[\u4E00-\u9FA5]', t))>0 else t for t in tokens]
这不就是取除了##的中文部分token吗
No branches or pull requests
请问预训练中数据处理whole
word mask 中这一行有什么作用,我发现如果去掉这行后效果会显著下降
roberta_zh/create_pretraining_data.py
Line 526 in 13f7849
The text was updated successfully, but these errors were encountered: