Skip to content

ericzhng/cuda-reduce-gpu-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Summation by reduction in GPU using CUDA

Reduction is popularly used for summation in either cpu or gpu code. Comparisons between different implementations of reduction code are done using CUDA C/C++.

Note

The ideas are referred from [1] where the author discussed in detail the implementations. This code is to verify the implementation for data size of power of 2 or not.

Issues

The test is done using data size of power of 2. If the data size is not power of 2, then some issues might arise. It could be that the improved reduce methods won't work well as reference version of code. The improved version has to use a larger size with a power of 2.

We can adjust the performance in following two ways:

[1] Divide the data size into a size of power of 2, and one that is not. Apply reduce to each set independently.

[2] Change the threads and blocks assignment, most of the time, it works, sometimes, it doesn't.

[3] Leave the other part of data in cpu reduction, if that is a small set.

Reference

[1] Optimizing Parallel Reduction in CUDA - Mark Harris.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published