bitmasks2tensor extremely slow #336

SupetZYK · 2021-06-29T17:05:17Z

I found my training of dbnet or pannet or any text detector which contain bitmasks2tensor in losses get extremely slow training. I debug every stage and found that the problem is with this function. So Why is that happen?

my environment

sys.platform: linux
Python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0]
CUDA available: True
GPU 0: Tesla V100-SXM2-32GB
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
PyTorch: 1.7.1+cu110
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.0
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.2+cu110
OpenCV: 4.5.2
MMCV: 1.3.8
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: not available
MMOCR: 0.2.0+8a4eda6

The text was updated successfully, but these errors were encountered:

innerlee · 2021-06-30T02:08:15Z

Thanks for your report! We will profile this function and improve it if it is slow

innerlee added speed enhancement New feature or request labels Jun 30, 2021

innerlee assigned innerlee and jeffreykuang Jun 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bitmasks2tensor extremely slow #336

bitmasks2tensor extremely slow #336

SupetZYK commented Jun 29, 2021

innerlee commented Jun 30, 2021

bitmasks2tensor extremely slow #336

bitmasks2tensor extremely slow #336

Comments

SupetZYK commented Jun 29, 2021

innerlee commented Jun 30, 2021