You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we have a static approach for balancing gpu block load and balancing load between multiple gpu's. This works fine for smaller board sizes.
However, with increasing board size, the computation wastes a lot of time in the end because the gpu is not 100% occupied any more or respectively some gpu's are already done while one is still working.
Solution: We should find a dynamic approach to balance the load between gpu blocks and also balance the load between multiple gpu's. It would be optimal if the approach addresses both issues at once.
Proposal: Have the whole job pool in CUDA / OpenCL pinned memory that each device can access. Keep in mind that all threads in a gpu block require to have equal jkl's.
The text was updated successfully, but these errors were encountered:
Currently we have a static approach for balancing gpu block load and balancing load between multiple gpu's. This works fine for smaller board sizes.
However, with increasing board size, the computation wastes a lot of time in the end because the gpu is not 100% occupied any more or respectively some gpu's are already done while one is still working.
Solution: We should find a dynamic approach to balance the load between gpu blocks and also balance the load between multiple gpu's. It would be optimal if the approach addresses both issues at once.
Proposal: Have the whole job pool in CUDA / OpenCL pinned memory that each device can access. Keep in mind that all threads in a gpu block require to have equal jkl's.
The text was updated successfully, but these errors were encountered: