Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

310P卡推理失败 #7489

Open
1 task done
ChenZhongPu opened this issue Mar 26, 2025 · 1 comment
Open
1 task done

310P卡推理失败 #7489

ChenZhongPu opened this issue Mar 26, 2025 · 1 comment
Labels
bug Something isn't working npu This problem is related to NPU devices pending This problem is yet to be addressed

Comments

@ChenZhongPu
Copy link

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.3.dev0
  • Platform: Linux-4.19.90-89.20.v2401.ky10.aarch64-aarch64-with-glibc2.28
  • Python version: 3.10.16
  • PyTorch version: 2.4.0 (NPU)
  • Transformers version: 4.50.0
  • Datasets version: 3.4.1
  • Accelerate version: 1.5.2
  • PEFT version: 0.15.0
  • TRL version: 0.9.6
  • NPU type: Ascend310P3
  • CANN version: 8.0.0.alpha001
  • Git commit: 59e12bf

Reproduction

如果不指定device,能正常使用 llamafactory-cli api 启动,但是推理时报错:

[Error]: System Direct Memory Access (DMA) hardware execution error.

估计是310P不支持多卡运行?我如果设置了卡,运行

ASCEND_RT_VISIBLE_DEVICES=0 llamafactory-cli api examples/inference/llama3_local.yaml

会报错:

RuntimeError: call aclnnCast failed, detail:EZ1001: [PID: 3121044] 2025-03-26-15:21:21.052.218 self not implemented for DT_BFLOAT16, should be in dtype support list [DT_FLOAT16,DT_FLOAT,DT_DOUBLE,DT_INT8,DT_UINT8,DT_INT16,DT_INT32,DT_INT64,DT_UINT16,DT_UINT32,DT_UINT64,DT_BOOL,DT_COMPLEX64,DT_COMPLEX128,].

类似issue 3796。看错误信息是数据类型不能是bfloat16,而我已经在模型的config.json中修改成float16了。

其中issue 3796是重新安装了910的kernel算子,我的是310P卡,安装的算子是:Ascend-cann-kernels-310p_8.0.0.alpha001_linux-aarch64.run

感觉NPU的环境特别混乱,对小版本要求也很严格,而很多教程都是针对910b的。请问大家在310p卡上正确实践是什么?

Others

No response

@ChenZhongPu ChenZhongPu added bug Something isn't working pending This problem is yet to be addressed labels Mar 26, 2025
@hiyouga hiyouga added the npu This problem is related to NPU devices label Mar 26, 2025
@monument-and-sea-all-the-gift

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.3.dev0
  • Platform: Linux-4.19.90-89.20.v2401.ky10.aarch64-aarch64-with-glibc2.28
  • Python version: 3.10.16
  • PyTorch version: 2.4.0 (NPU)
  • Transformers version: 4.50.0
  • Datasets version: 3.4.1
  • Accelerate version: 1.5.2
  • PEFT version: 0.15.0
  • TRL version: 0.9.6
  • NPU type: Ascend310P3
  • CANN version: 8.0.0.alpha001
  • Git commit: 59e12bf

Reproduction

如果不指定device,能正常使用 llamafactory-cli api 启动,但是推理时报错:

[Error]: System Direct Memory Access (DMA) hardware execution error.

估计是310P不支持多卡运行?我如果设置了卡,运行

ASCEND_RT_VISIBLE_DEVICES=0 llamafactory-cli api examples/inference/llama3_local.yaml
会报错:

RuntimeError: call aclnnCast failed, detail:EZ1001: [PID: 3121044] 2025-03-26-15:21:21.052.218 self not implemented for DT_BFLOAT16, should be in dtype support list [DT_FLOAT16,DT_FLOAT,DT_DOUBLE,DT_INT8,DT_UINT8,DT_INT16,DT_INT32,DT_INT64,DT_UINT16,DT_UINT32,DT_UINT64,DT_BOOL,DT_COMPLEX64,DT_COMPLEX128,].

类似issue 3796。看错误信息是数据类型不能是bfloat16,而我已经在模型的config.json中修改成float16了。

其中issue 3796是重新安装了910的kernel算子,我的是310P卡,安装的算子是:Ascend-cann-kernels-310p_8.0.0.alpha001_linux-aarch64.run

感觉NPU的环境特别混乱,对小版本要求也很严格,而很多教程都是针对910b的。请问大家在310p卡上正确实践是什么?

Others

No response

我这是显存翻倍的问题,旧版本好像可以跑

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working npu This problem is related to NPU devices pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

3 participants