Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

报错 解决不了 ,作者大大可以帮忙看看吗? #21

Open
Oldsport-996 opened this issue Apr 21, 2022 · 2 comments
Open

报错 解决不了 ,作者大大可以帮忙看看吗? #21

Oldsport-996 opened this issue Apr 21, 2022 · 2 comments

Comments

@Oldsport-996
Copy link

(gitabtion) F:\0code\gitabtion>python main.py --mode preproc
Namespace(accumulate_grad_batches=16, batch_size=16, bert_checkpoint='bert-base-chinese', device=device(type='cpu'), epochs=10, gpu_index=0, hard_device='cpu', load_checkpoint=False
, loss_weight=0.8, lr=0.0001, mode='preproc', model_save_path='checkpoint', warmup_epochs=8)
preprocessing...
Traceback (most recent call last):
File "main.py", line 99, in
main()
File "main.py", line 63, in main
preproc()
File "F:\0code\gitabtion\src\data_processor.py", line 201, in preproc
for item in read_data(get_abs_path('data')):
File "F:\0code\gitabtion\src\data_processor.py", line 131, in read_data
for line in f:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 16: illegal multibyte sequence

我尝试修改了改了这个data_processed.py这个文件的129行,改为encoding未'utf-8'或者 ascii等都未成功,好难受,这是什么问题

@gitabtion
Copy link
Owner

这个仓库的数据处理脚本是有些问题,可以使用这个仓库 BertBasedCorrectionModels 处理数据后,再用本仓库训练

@hongge778
Copy link

(gitabtion) F:\0code\gitabtion>python main.py --mode preproc Namespace(accumulate_grad_batches=16, batch_size=16, bert_checkpoint='bert-base-chinese', device=device(type='cpu'), epochs=10, gpu_index=0, hard_device='cpu', load_checkpoint=False , loss_weight=0.8, lr=0.0001, mode='preproc', model_save_path='checkpoint', warmup_epochs=8) preprocessing... Traceback (most recent call last): File "main.py", line 99, in main() File "main.py", line 63, in main preproc() File "F:\0code\gitabtion\src\data_processor.py", line 201, in preproc for item in read_data(get_abs_path('data')): File "F:\0code\gitabtion\src\data_processor.py", line 131, in read_data for line in f: UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 16: illegal multibyte sequence

我尝试修改了改了这个data_processed.py这个文件的129行,改为encoding未'utf-8'或者 ascii等都未成功,好难受,这是什么问题
兄弟最后问题怎么解决的?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants