Skip to content

Commit

Permalink
Bluestein's algorithm implementation in VkFFT
Browse files Browse the repository at this point in the history
-Added support for Bluestein's algorithm to cover arbitrary FFT sequences (including big primes). Optimized performance by using zeropadding and convolution merging for the lowest amount of memory transfers
-Currently implemented coverage: full range of C2C (including multi-upload), single-upload R2C/C2R/R2R (2^12, 2^12, 2^12 - depending on shared memory configuration)
-New configuration options: keepShaderCode - will print executed kernels as they are launched, omitDimension[3] - allows to disable specific FFT dimensions (C2C and R2R only), fixMaxRadixBluestein - allows to control the max primes used in Bluestein padding (can reduce memory usage if set higher)
-New benchmarks and precision tests to check Bluestein's implementation validity
-Bugfixes
  • Loading branch information
DTolm committed Jul 25, 2021
1 parent 476b0db commit 5c8f160
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 7 deletions.
12 changes: 6 additions & 6 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
cmake_minimum_required(VERSION 3.10)
project(Vulkan_FFT)
#set(CMAKE_CONFIGURATION_TYPES "Release" CACHE STRING "" FORCE)
#set(CMAKE_BUILD_TYPE "Release" CACHE STRING "" FORCE)
set(CMAKE_CONFIGURATION_TYPES "Release" CACHE STRING "" FORCE)
set(CMAKE_BUILD_TYPE "Release" CACHE STRING "" FORCE)
set(VKFFT_BACKEND 0 CACHE STRING "0 - Vulkan, 1 - CUDA, 2 - HIP, 3 - OpenCL")

if(${VKFFT_BACKEND} EQUAL 1)
option(build_VkFFT_cuFFT_benchmark "Build VkFFT cuFFT benchmark" OFF)
option(build_VkFFT_cuFFT_benchmark "Build VkFFT cuFFT benchmark" ON)
else()
option(build_VkFFT_cuFFT_benchmark "Build VkFFT cuFFT benchmark" OFF)
endif()
Expand All @@ -16,7 +16,7 @@ else()
option(build_VkFFT_rocFFT_benchmark "Build VkFFT rocFFT benchmark" OFF)
endif()

option(build_VkFFT_FFTW_precision "Build VkFFT FFTW precision comparison" ON)
option(build_VkFFT_FFTW_precision "Build VkFFT FFTW precision comparison" OFF)
if (MSVC)
set_property(DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} PROPERTY VS_STARTUP_PROJECT ${PROJECT_NAME})
add_definitions(-D_CRT_SECURE_NO_WARNINGS)
Expand Down Expand Up @@ -142,8 +142,8 @@ endif()

if(build_VkFFT_FFTW_precision)
add_definitions(-DUSE_FFTW)
set(FFTW3_LIB_DIR "C:/Users/dtolm/Documents/FFTW")
set(FFTW3_INCLUDE_DIR "C:/Users/dtolm/Documents/FFTW")
set(FFTW3_LIB_DIR "/usr/lib/x86_64-linux-gnu/")
set(FFTW3_INCLUDE_DIR "/usr/include/")
find_library(
FFTW_LIB
NAMES "libfftw3-3" "fftw3"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ VkFFT is an efficient GPU-accelerated multidimensional Fast Fourier Transform li
- Forward and inverse directions of FFT
- Support for big FFT dimension sizes. Current limits: C2C or even C2R/R2C - (2^32, 2^32, 2^32). Odd C2R/R2C - (2^12, 2^32, 2^32). R2R - (2^12, 2^12, 2^12). Depends on the amount of shared memory on device. (will be increased later).
- Radix-2/3/4/5/7/8/11/13 FFT. Sequences using radix 3, 5, 7, 11 and 13 have comparable performance to that of powers of 2.
- Bluestein FFT algorithm for all other sequences. Full coverage of C2C range, single upload (2^12, 2^12, 2^12) for R2C/C2R/R2R. Optimized to have as few memory transfers as possible by using zeropadding and merged convolution support of VkFFT
- Bluestein's FFT algorithm for all other sequences. Full coverage of C2C range, single upload (2^12, 2^12, 2^12) for R2C/C2R/R2R. Optimized to have as few memory transfers as possible by using zeropadding and merged convolution support of VkFFT
- Single, double and half precision support. Double precision uses CPU generated LUT tables. Half precision still does all computations in single and only uses half precision to store data.
- All transformations are performed in-place with no performance loss. Out-of-place transforms are supported by selecting different input/output buffers.
- No additional transposition uploads. Note: data can be reshuffled after the four step FFT algorithm with additional buffer (for big sequences). Doesn't matter for convolutions - they return to the input ordering (saves memory).
Expand Down

0 comments on commit 5c8f160

Please sign in to comment.