Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when running test_dpjit_reduction.py on Intel ARC A770 GPU device #1438

Open
diptorupd opened this issue Apr 16, 2024 · 1 comment

Comments

@diptorupd
Copy link
Contributor

diptorupd commented Apr 16, 2024

I encountered a segfault when running the dpjit reduction test case on ARC with numba-dpex 0.23.0rc1. The same tests fail with 0.21.1, but do not segfault.

$ ONEAPI_DEVICE_SELECTOR=*:gpu NUMBA_CAPTURED_ERRORS=new_style pytest numba_dpex/
tests/dpjit_tests/test_dpjit_reduction.py
================================================================== test session starts ===================================================================
platform linux -- Python 3.10.14, pytest-8.1.1, pluggy-1.4.0
rootdir: /localdisk/work/diptorup/devel/numba-dpex
configfile: pyproject.toml
plugins: cov-5.0.0
collected 9 items                                                                                                                                        

numba_dpex/tests/dpjit_tests/test_dpjit_reduction.py Fatal Python error: Segmentation fault

Current thread 0x00007fbcddd12b80 (most recent call first):
  File "/localdisk/work/diptorup/devel/numba-dpex/numba_dpex/core/parfors/kernel_builder.py", line 105 in _compile_kernel_parfor
  File "/localdisk/work/diptorup/devel/numba-dpex/numba_dpex/core/parfors/reduction_kernel_builder.py", line 174 in create_reduction_main_kernel_for_parfor
  File "/localdisk/work/diptorup/devel/numba-dpex/numba_dpex/core/parfors/parfor_lowerer.py", line 262 in _reduction_codegen
  File "/localdisk/work/diptorup/devel/numba-dpex/numba_dpex/core/parfors/parfor_lowerer.py", line 385 in _lower_parfor_as_kernel
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/parfors/parfor_lowering.py", line 69 in _lower_parfor_parallel
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/parfors/parfor_lowering.py", line 51 in lower_inst
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/lowering.py", line 270 in lower_block
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/lowering.py", line 256 in lower_function_body
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/lowering.py", line 226 in lower_normal_function
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/lowering.py", line 187 in lower
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/typed_passes.py", line 468 in run_pass
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 273 in check
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 311 in _runPass
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/compiler_lock.py", line 35 in _acquire_compile_lock
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 356 in run
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/compiler.py", line 479 in _compile_core
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/compiler.py", line 513 in _compile_bytecode
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/compiler.py", line 445 in compile_extra
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/compiler.py", line 751 in compile_extra
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/dispatcher.py", line 152 in _compile_core
  File "/localdisk/work/diptorup/devel/numba-dpex/numba_dpex/core/dpjit_dispatcher.py", line 34 in _compile_cached
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/dispatcher.py", line 125 in compile
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/dispatcher.py", line 957 in compile
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/numba/core/dispatcher.py", line 420 in _compile_for_args
  File "/localdisk/work/diptorup/devel/numba-dpex/numba_dpex/tests/dpjit_tests/test_dpjit_reduction.py", line 75 in test_dpjit_array_arg_types_add1
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/python.py", line 195 in pytest_pyfunc_call
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_callers.py", line 102 in _multicall
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_manager.py", line 119 in _hookexec
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_hooks.py", line 501 in __call__
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/python.py", line 1772 in runtest
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/runner.py", line 172 in pytest_runtest_call
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_callers.py", line 102 in _multicall
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_manager.py", line 119 in _hookexec
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_hooks.py", line 501 in __call__
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/runner.py", line 240 in <lambda>
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/runner.py", line 340 in from_call
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/runner.py", line 239 in call_and_report
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/runner.py", line 134 in runtestprotocol
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/runner.py", line 115 in pytest_runtest_protocol
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_callers.py", line 102 in _multicall
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_manager.py", line 119 in _hookexec
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_hooks.py", line 501 in __call__
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_callers.py", line 102 in _multicall
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_manager.py", line 119 in _hookexec
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_hooks.py", line 501 in __call__
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/main.py", line 339 in _main
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/main.py", line 285 in wrap_session
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_callers.py", line 102 in _multicall
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_manager.py", line 119 in _hookexec
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/pluggy/_hooks.py", line 501 in __call__
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/config/__init__.py", line 174 in main
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/lib/python3.10/site-packages/_pytest/config/__init__.py", line 197 in console_main
  File "/localdisk/work/diptorup/miniconda3/envs/dpex-devel/bin/pytest", line 10 in <module>

Extension modules: dpctl._sycl_context, dpctl._sycl_platform, dpctl._sycl_device, dpctl._sycl_device_factory, dpctl._sycl_event, dpctl.program._program, dpctl._sycl_queue_manager, mkl._mklinit, mkl._py_mkl_service, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, mkl_fft._pydfti, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, _patch, mkl_umath._patch, dpctl.memory._memory, dpctl._sycl_queue, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, dpctl.utils._compute_follows_data, dpctl.tensor._dlpack, dpctl.tensor._flags, dpctl.tensor._usmarray, dpnp.dpnp_utils.dpnp_algo_utils, dpnp.dpnp_algo.dpnp_algo, dpnp.fft.dpnp_algo_fft, dpnp.linalg.dpnp_algo_linalg, dpnp.random.dpnp_algo_random (total: 43)
Segmentation fault (core dumped)
@diptorupd diptorupd self-assigned this Apr 16, 2024
@diptorupd
Copy link
Contributor Author

The segfault is resulting from a scenario where the test case uses a double precision floating point value that is not supported on ARC. To work around the issue the failing GDB test case should be updated to not use floating point numbers so that it works on all devices as expected.

@diptorupd diptorupd removed their assignment Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant