-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
brgconv:sve_256 uses a lot of memory #2007
Comments
Hi @jondea |
I suspect you need
|
Memory usage is high for BRGEMM Convolution, and we are analyzing it, but I am not able to identify the error as the tests are passing. Test outputmake test output :
complete output summary
testcase output
Environment
|
Thank you for looking into it. Yes, the error will only happen if you run out of memory, which is machine dependent. It was happening for us because we ran multiple tests in parallel, but it would also happen on a smaller machine (e.g. C7g.2xlarge). |
@kasturedeeksha the issue seems to be that benchdnn estimates the problem size incorrectly for brgconv:sve_256. When running the x86 impl, benchdnn estimates the problem to require 17.7 GB of memory (regardless of # of threads) and so skips the test if AVAILABLE_MEMORY*0.75 <= 17.7 GB. Meanwhile, for this implementation the estimate is 11 GB (when in reality it's >16 GB) and so the test is allowed to run on c7g.2xlarge, for example. At the same time, if you try to run it on c7g.xlarge, the test will be skipped as in x86 case. So the fix here will likely require looking into the logic of estimating problem size (seems to be mostly in tests/benchdnn/dnnl_common.cpp) and fixing whatever causes the value to be calculated incorrectly for brgconv:sve_256. @vpirogov do you have an idea what could be going wrong here? What's the idea behind how we obtain the estimate for the size of the problem? |
Hi @michalowski-arm ,
The memory check relies on primitive_descriptor queries (see here). It takes the sum of arguments (inputs/outputs), and scratchpad sizes. Mismatch between actual memory consumption and estimated one can come from:
|
Summary
brgconv:sve_256
uses a lot of memory. Specifically, we were finding that our test runners were failing when runningtest_benchdnn_modeC_conv_3d_cpu
due to the benchdnn testusing 16.1Gb of memory. It appears to be independent of the number of threads (tested on 1,8,16 threads). This is on main/v3.6 319a77e
@kasturedeeksha is this amount of memory use expected? Is this something you have seen with
brgconv:sve_512
?Environment
The text was updated successfully, but these errors were encountered: