Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_benchdnn_modeC_conv_ci_cpu fails on AArch64 CI for c7g instance #2303

Open
renato-arantes opened this issue Dec 20, 2024 · 2 comments
Open
Labels
bug A confirmed library bug platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64

Comments

@renato-arantes
Copy link
Contributor

Summary

The test test_benchdnn_modeC_conv_ci_cpu with fpmath mode enabled for bf16 fails on AArch64 GitHub CI at an AWS c7g.

Version

ACL_VERSION: v24.11.1

Environment

OneDNN GitHub CI for AArch64 on a c7g AWS instance.

Steps to reproduce

benchdnn --conv --dir=FWD_D --attr-fpmath=bf16

Observed behaviour

Test fail:

2024-12-19T17:06:18.6979025Z run: --conv --dir=FWD_D --attr-fpmath=bf16 ic17ih8oc17oh4kh1sh2ph0n"conv_basic_2d:1x1_stride_tail"
2024-12-19T17:06:18.6979225Z [   4][DST][0:0:1:0] exp_f32:         -25 exp:         -25 got:         nan diff:     nan rdiff:     nan
2024-12-19T17:06:18.6979403Z [  11][DST][0:0:2:3] exp_f32:         -19 exp:         -19 got:         nan diff:     nan rdiff:     nan
2024-12-19T17:06:18.6979583Z [  14][DST][0:0:3:2] exp_f32:         -29 exp:         -29 got:         nan diff:     nan rdiff:     nan
2024-12-19T17:06:18.6979751Z [  15][DST][0:0:3:3] exp_f32:         -16 exp:         -16 got:         nan diff:     nan rdiff:     nan
2024-12-19T17:06:18.6979922Z [  20][DST][0:1:1:0] exp_f32:          24 exp:          24 got:         nan diff:     nan rdiff:     nan
2024-12-19T17:06:18.6980092Z [  27][DST][0:1:2:3] exp_f32:         -35 exp:         -35 got:         nan diff:     nan rdiff:     nan
2024-12-19T17:06:18.6980363Z [  30][DST][0:1:3:2] exp_f32:         -30 exp:         -30 got:         nan diff:     nan rdiff:     nan
2024-12-19T17:06:18.6980534Z [  31][DST][0:1:3:3] exp_f32:         -26 exp:         -26 got:         nan diff:     nan rdiff:     nan
2024-12-19T17:06:18.6980713Z [  36][DST][0:2:1:0] exp_f32:         -17 exp:         -17 got:         nan diff:     nan rdiff:     nan
2024-12-19T17:06:18.6980886Z [  43][DST][0:2:2:3] exp_f32:         -40 exp:         -40 got:         nan diff:     nan rdiff:     nan
2024-12-19T17:06:18.6981241Z [COMPARE_STATS][DST]: trh=0 err_max_diff:     nan err_max_rdiff:     nan all_max_diff:       0 all_max_rdiff:       0

Expected behavior

Test pass.

@renato-arantes renato-arantes added the sighting Suspicious library behavior. Should be promoted to a bug when confirmed label Dec 20, 2024
@theComputeKid theComputeKid added bug A confirmed library bug and removed sighting Suspicious library behavior. Should be promoted to a bug when confirmed labels Dec 20, 2024
@theComputeKid
Copy link
Contributor

This is a sporadic issue that shows up in the AArch64 CI now that we have expanded the test set. It goes away if the job is restarted.

@theComputeKid theComputeKid changed the title Test test_benchdnn_modeC_conv_ci_cpu failing on AArch64 GitHub CI for c7g AWS instance. test_benchdnn_modeC_conv_ci_cpu fails on AArch64 CI for c7g instance Dec 20, 2024
@vpirogov vpirogov added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Dec 20, 2024
@vpirogov
Copy link
Member

This is a sporadic issue that shows up in the AArch64 CI now that we have expanded the test set. It goes away if the job is restarted.

Three times as fun as stable fail!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A confirmed library bug platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64
Projects
None yet
Development

No branches or pull requests

3 participants