Skip to content

Memory allocation in GPU plugin

Taylor Yeonbok Lee edited this page Apr 3, 2022 · 30 revisions

Memory allocation in GPU plugin

Allocation types

GPU plugin supports 4 types of memory allocation as follows. Here "usm_" allocation types are the allocation using Intel Unified Shared Memory (USM) extension for OpenCL. For more detailed explanation about the USM extension, refer to this page.

  • cl_mem : Standard OpenCL cl_mem allocation
  • usm_host : Allocated in host memory and accessible by all of them. Not migratable.
  • usm_shared : Allocated in host and devices and accessible by all of them. The memories are automatically migrated on demand.
  • usm_device : Allocated in device memory and accessible only by the device which owns the memory. Not migratable.

Note that there are following restrictions on a memory allocation by a driver:

  • Allocation of a memory object should not exceed the available memory size obtained from CL_DEVICE_GLOBAL_MEM_SIZE
  • Total allocation of memory objects to a kernel (i.e., the sum of inputs, intermediate buffers, outputs of the kernel) should not exceed the available memory. For example, if you want to allocate a memory object to the device memory, the above restrictions should be met for the available device memory. Otherwise, the memory object should be allocated to host memory.

Memory allocation API

In GPU plugin, actual allocation for each allocation types is be done through engine::allocate_memory which calls the corresponding memory object wrapper for each allocation type: gpu_buffer, gpu_usm.

Also, the total allocated amount of memory for each allocation type are managed per engine, so that you can check the allocation history by setting environment variable OV_GPU_Verbose=1 for OpenVino built with ENABLE_DEBUG_CAPS=ON.

...
GPU_Debug: Allocate 58982400 bytes of usm_host allocation type (current=117969612; max=117969612)
GPU_Debug: Allocate 44621568 bytes of usm_device allocation type (current=44626380; max=44626380)
GPU_Debug: Allocate 44236800 bytes of usm_host allocation type (current=162206412; max=162206412)
GPU_Debug: Allocate 14873856 bytes of usm_device allocation type (current=59500236; max=59500236)
...

Allocated memory objects

The major allocation done in GPU plugin can be categorized as follows:

  • Constant memory allocation: In GPU plugin, constant data are hold by data primitive and the memory objects are allocated at the creation of the topology. At that time, the required data are copied from the corresponding blob in ngraph. After all transformation in program is finished and is the user of those memories are GPU operation and the GPU has device memory, then those constants are to be transferred to device memory. Note that constant data are shared within batches and streams.

  • Output memory allocation: A memory object to store the output result of each primitive is created at the creation of each primitive_inst (link), unless its output is reusing the input memory or the node is a mutable data to be used as a 2nd output. Note that the creation of a primitive_inst is done in an descending order of the output memory size for the purpose of memory reusing efficiency by the memory pool.

  • Intermediate memory allocation: Some primitives such as detection_output and non_max_suppression consisting of multiple kernels require intermediate memories to transfer data b/w those kernels. The allocation of such intermediate memories happens after all primitive_insts are finished (link), since it needs to be processed in a processing order to use the predecessors' allocation information to decide whether to allocate it on device memory or not by checking the memory allocation restriction described above.

Memory pool and memory reuse