310P卡推理失败 #7489

ChenZhongPu · 2025-03-26T07:25:17Z

Reminder

I have read the above rules and searched the existing issues.

System Info

llamafactory version: 0.9.3.dev0
Platform: Linux-4.19.90-89.20.v2401.ky10.aarch64-aarch64-with-glibc2.28
Python version: 3.10.16
PyTorch version: 2.4.0 (NPU)
Transformers version: 4.50.0
Datasets version: 3.4.1
Accelerate version: 1.5.2
PEFT version: 0.15.0
TRL version: 0.9.6
NPU type: Ascend310P3
CANN version: 8.0.0.alpha001
Git commit: 59e12bf

Reproduction

如果不指定device，能正常使用 llamafactory-cli api 启动，但是推理时报错：

[Error]: System Direct Memory Access (DMA) hardware execution error.

估计是310P不支持多卡运行？我如果设置了卡，运行

ASCEND_RT_VISIBLE_DEVICES=0 llamafactory-cli api examples/inference/llama3_local.yaml

会报错：

RuntimeError: call aclnnCast failed, detail:EZ1001: [PID: 3121044] 2025-03-26-15:21:21.052.218 self not implemented for DT_BFLOAT16, should be in dtype support list [DT_FLOAT16,DT_FLOAT,DT_DOUBLE,DT_INT8,DT_UINT8,DT_INT16,DT_INT32,DT_INT64,DT_UINT16,DT_UINT32,DT_UINT64,DT_BOOL,DT_COMPLEX64,DT_COMPLEX128,].

类似issue 3796。看错误信息是数据类型不能是bfloat16，而我已经在模型的config.json中修改成float16了。

其中issue 3796是重新安装了910的kernel算子，我的是310P卡，安装的算子是：Ascend-cann-kernels-310p_8.0.0.alpha001_linux-aarch64.run。

感觉NPU的环境特别混乱，对小版本要求也很严格，而很多教程都是针对910b的。请问大家在310p卡上正确实践是什么？

Others

No response

The text was updated successfully, but these errors were encountered:

monument-and-sea-all-the-gift · 2025-03-28T09:42:54Z

Reminder

I have read the above rules and searched the existing issues.

System Info

llamafactory version: 0.9.3.dev0

Platform: Linux-4.19.90-89.20.v2401.ky10.aarch64-aarch64-with-glibc2.28

Python version: 3.10.16

PyTorch version: 2.4.0 (NPU)

Transformers version: 4.50.0

Datasets version: 3.4.1

Accelerate version: 1.5.2

PEFT version: 0.15.0

TRL version: 0.9.6

NPU type: Ascend310P3

CANN version: 8.0.0.alpha001

Git commit: 59e12bf

Reproduction

如果不指定device，能正常使用 llamafactory-cli api 启动，但是推理时报错：

[Error]: System Direct Memory Access (DMA) hardware execution error.

估计是310P不支持多卡运行？我如果设置了卡，运行

ASCEND_RT_VISIBLE_DEVICES=0 llamafactory-cli api examples/inference/llama3_local.yaml
会报错：

RuntimeError: call aclnnCast failed, detail:EZ1001: [PID: 3121044] 2025-03-26-15:21:21.052.218 self not implemented for DT_BFLOAT16, should be in dtype support list [DT_FLOAT16,DT_FLOAT,DT_DOUBLE,DT_INT8,DT_UINT8,DT_INT16,DT_INT32,DT_INT64,DT_UINT16,DT_UINT32,DT_UINT64,DT_BOOL,DT_COMPLEX64,DT_COMPLEX128,].

类似issue 3796。看错误信息是数据类型不能是bfloat16，而我已经在模型的config.json中修改成float16了。

其中issue 3796是重新安装了910的kernel算子，我的是310P卡，安装的算子是：Ascend-cann-kernels-310p_8.0.0.alpha001_linux-aarch64.run。

感觉NPU的环境特别混乱，对小版本要求也很严格，而很多教程都是针对910b的。请问大家在310p卡上正确实践是什么？

Others

No response

我这是显存翻倍的问题，旧版本好像可以跑

ChenZhongPu added bug Something isn't working pending This problem is yet to be addressed labels Mar 26, 2025

hiyouga added the npu This problem is related to NPU devices label Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

310P卡推理失败 #7489

310P卡推理失败 #7489

ChenZhongPu commented Mar 26, 2025

monument-and-sea-all-the-gift commented Mar 28, 2025

Reminder

System Info

Reproduction

Others

310P卡推理失败 #7489

310P卡推理失败 #7489

Comments

ChenZhongPu commented Mar 26, 2025

Reminder

System Info

Reproduction

Others

monument-and-sea-all-the-gift commented Mar 28, 2025

Reminder

System Info

Reproduction

Others