Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

brgconv:sve_256 uses a lot of memory #2007

Open
jondea opened this issue Jul 24, 2024 · 6 comments
Open

brgconv:sve_256 uses a lot of memory #2007

jondea opened this issue Jul 24, 2024 · 6 comments
Labels
help wanted platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 sighting Suspicious library behavior. Should be promoted to a bug when confirmed

Comments

@jondea
Copy link
Contributor

jondea commented Jul 24, 2024

Summary

brgconv:sve_256 uses a lot of memory. Specifically, we were finding that our test runners were failing when running test_benchdnn_modeC_conv_3d_cpu due to the benchdnn test

./tests/benchdnn/benchdnn --conv g1ic32id192oc1od1kd192pd0n"3d_conv_large_shape:2"

using 16.1Gb of memory. It appears to be independent of the number of threads (tested on 1,8,16 threads). This is on main/v3.6 319a77e

@kasturedeeksha is this amount of memory use expected? Is this something you have seen with brgconv:sve_512?

Environment

  • CPU: Neoverse-V1 C7g.4xlarge
  • OS version: Ubuntu 20.04
  • Compiler version: gcc-10 (also seen with clang-17)
  • git hash: 319a77e
  • CMake version: 3.16.3
  • CMake output log
+ cmake -DDNNL_CPU_RUNTIME=OMP -DCMAKE_BUILD_TYPE=Release -DDNNL_BUILD_FOR_CI=ON -DDNNL_WERROR=OFF -DDNNL_TEST_SET=NIGHTLY ..
-- The C compiler identification is GNU 10.5.0
-- The CXX compiler identification is GNU 10.5.0
-- Check for working C compiler: /usr/bin/gcc-10
-- Check for working C compiler: /usr/bin/gcc-10 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/g++-10
-- Check for working CXX compiler: /usr/bin/g++-10 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- DNNL_TARGET_ARCH: AARCH64
-- DNNL_LIBRARY_NAME: dnnl
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
-- Could NOT find Doxyrest (missing: DOXYREST_EXECUTABLE)
-- Found PythonInterp: /usr/bin/python2.7 (found suitable version "2.7.18", minimum required is "2.7")
-- Found Sphinx: /usr/bin/sphinx-build (found version "sphinx-build 1.8.5")
-- Found Git: /usr/bin/git (found version "2.25.1")
-- Enabled testing coverage: NIGHTLY
-- Enabled workload: TRAINING
-- Enabled primitives: ALL
-- Enabled primitive CPU ISA: ALL
-- Enabled primitive GPU ISA: ALL
-- Enabled GeMM kernels ISA: ALL
-- Primitive cache is enabled
-- Graph component is enabled
-- Graph compiler backend is disabled.
-- Configuring done
-- Generating done
-- Build files have been written to: oneDNN/build
@jondea jondea added sighting Suspicious library behavior. Should be promoted to a bug when confirmed platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 labels Jul 24, 2024
@kasturedeeksha
Copy link
Contributor

Hi @jondea
Can you share the steps to reproduce the error using test_benchdnn_modeC_conv_3d_cpu?
For the particular test case
./tests/benchdnn/benchdnn --conv g1ic32id192oc1od1kd192pd0n"3d_conv_large_shape:2"
I'm not getting any errors and am able to run this test.
However, I am unable to run the complete test test_benchdnn_modeC_conv_3d_cpu. It gives the error: No rule to make target 'test_benchdnn_modeC_conv_3d_cpu'.

@jondea
Copy link
Contributor Author

jondea commented Jul 26, 2024

I suspect you need -DDNNL_TEST_SET=NIGHTLY in your cmake invocation. Here's the full cmake invocation I used, although I suspect it isn't all necessary.

cmake -DDNNL_CPU_RUNTIME=OMP -DCMAKE_BUILD_TYPE=Release -DDNNL_BUILD_FOR_CI=ON -DDNNL_WERROR=OFF -DDNNL_TEST_SET=NIGHTLY ..

@kasturedeeksha
Copy link
Contributor

Memory usage is high for BRGEMM Convolution, and we are analyzing it, but I am not able to identify the error as the tests are passing.

Test output

make test output :
test_benchdnn_modeC_conv_3d_cpu is passing when running make test

Start 188: test_benchdnn_modeC_conv_3d_cpu
188/300 Test #188: test_benchdnn_modeC_conv_3d_cpu .........................   Passed   32.83 sec

complete output summary

96% tests passed, 11 tests failed out of 300

Total Test time (real) = 7149.29 sec

The following tests FAILED:
        157 - test_graph_unit_dnnl_large_partition_usm_cpu (Failed)
        196 - test_benchdnn_modeC_conv_bfloat16_ymm_cpu (Failed)
        217 - test_benchdnn_modeC_deconv_bfloat16_ymm_cpu (Failed)
        226 - test_benchdnn_modeC_graph_bf16_cpu (Failed)
        229 - test_benchdnn_modeC_graph_fusions_cpu (Failed)
        230 - test_benchdnn_modeC_graph_int8_cpu (Failed)
        235 - test_benchdnn_modeC_ip_bfloat16_ymm_cpu (Failed)
        248 - test_benchdnn_modeC_matmul_bfloat16_ymm_cpu (Failed)
        252 - test_benchdnn_modeC_matmul_multidims_cpu (Failed)
        262 - test_benchdnn_modeC_reorder_all_cpu (Failed)
        281 - test_benchdnn_modeC_lstm_bfloat16_ymm_cpu (Failed)
Errors while running CTest
Output from these tests are in: /home/deekshak/xybak/oneDNN_v3.6/oneDNN/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
make: *** [Makefile:71: test] Error 8

testcase output

./tests/benchdnn/benchdnn --conv g1ic32id192oc1od1kd192pd0n"3d_conv_large_shape:2"
0:PASSED __REPRO: --conv g1ic32id192oc1od1kd192pd0n3d_conv_large_shape:2
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 40.19s; fill: 2.30s (6%); compute_ref: 1.64s (4%); compare: 24.38s (61%);

Environment

  • CPU: Neoverse-V1 C7g.4xlarge
  • OS version: Ubuntu 22.04.4
  • Compiler version: gcc-11.4.0
  • git hash: 319a77e
  • CMake version: 3.29.6
  • CMake output log
cmake -DDNNL_CPU_RUNTIME=OMP -DCMAKE_BUILD_TYPE=Release -DDNNL_BUILD_FOR_CI=ON -DDNNL_WERROR=OFF -DDNNL_TEST_SET=NIGHTLY ..
CMake Deprecation Warning at CMakeLists.txt:17 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.
 
  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.
 
 
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- DNNL_TARGET_ARCH: AARCH64
-- DNNL_LIBRARY_NAME: dnnl
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
-- Could NOT find Doxyrest (missing: DOXYREST_EXECUTABLE)
CMake Warning (dev) at cmake/Sphinx.cmake:25 (find_package):
  Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
  are removed.  Run "cmake --help-policy CMP0148" for policy details.  Use
  the cmake_policy command to set the policy and suppress this warning.
 
Call Stack (most recent call first):
  cmake/doc.cmake:28 (include)
  CMakeLists.txt:127 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.
 
-- Found PythonInterp: /usr/bin/python2.7 (found suitable version "2.7.18", minimum required is "2.7")
-- Could NOT find Sphinx (missing: SPHINX_EXECUTABLE)
-- Found Git: /usr/bin/git (found version "2.34.1")
-- Enabled testing coverage: NIGHTLY
-- Enabled workload: TRAINING
-- Enabled primitives: ALL
-- Enabled primitive CPU ISA: ALL
-- Enabled primitive GPU ISA: ALL
-- Enabled GeMM kernels ISA: ALL
-- Primitive cache is enabled
-- Graph component is enabled
-- Graph compiler backend is disabled.
-- Configuring done (65.6s)
-- Generating done (20.4s)
-- Build files have been written to: /home/deekshak/xybak/oneDNN_v3.6/oneDNN/build 

@jondea
Copy link
Contributor Author

jondea commented Aug 1, 2024

Thank you for looking into it. Yes, the error will only happen if you run out of memory, which is machine dependent. It was happening for us because we ran multiple tests in parallel, but it would also happen on a smaller machine (e.g. C7g.2xlarge).

@michalowski-arm
Copy link
Contributor

@kasturedeeksha the issue seems to be that benchdnn estimates the problem size incorrectly for brgconv:sve_256. When running the x86 impl, benchdnn estimates the problem to require 17.7 GB of memory (regardless of # of threads) and so skips the test if AVAILABLE_MEMORY*0.75 <= 17.7 GB. Meanwhile, for this implementation the estimate is 11 GB (when in reality it's >16 GB) and so the test is allowed to run on c7g.2xlarge, for example. At the same time, if you try to run it on c7g.xlarge, the test will be skipped as in x86 case. So the fix here will likely require looking into the logic of estimating problem size (seems to be mostly in tests/benchdnn/dnnl_common.cpp) and fixing whatever causes the value to be calculated incorrectly for brgconv:sve_256.

@vpirogov do you have an idea what could be going wrong here? What's the idea behind how we obtain the estimate for the size of the problem?

@mgouicem
Copy link
Contributor

mgouicem commented Dec 20, 2024

Hi @michalowski-arm ,

What's the idea behind how we obtain the estimate for the size of the problem?

The memory check relies on primitive_descriptor queries (see here). It takes the sum of arguments (inputs/outputs), and scratchpad sizes.

Mismatch between actual memory consumption and estimated one can come from:

  • memory allocated without using scratchpad registry (e.g. calling malloc). These will not be reported by queries,
  • large code buffer (e.g. if a very large dimension if fully unrolled in the kernel)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 sighting Suspicious library behavior. Should be promoted to a bug when confirmed
Projects
None yet
Development

No branches or pull requests

5 participants