Bluestein's algorithm implementation in VkFFT

-Added support for Bluestein's algorithm to cover arbitrary FFT sequences (including big primes). Optimized performance by using zeropadding and convolution merging for the lowest amount of memory transfers -Currently implemented coverage: full range of C2C (including multi-upload), single-upload R2C/C2R/R2R (2^12, 2^12, 2^12 - depending on shared memory configuration) -New configuration options: keepShaderCode - will print executed kernels as they are launched, omitDimension[3] - allows to disable specific FFT dimensions (C2C and R2R only), fixMaxRadixBluestein - allows to control the max primes used in Bluestein padding (can reduce memory usage if set higher) -New benchmarks and precision tests to check Bluestein's implementation validity -Bugfixes
DTolm · Jul 25, 2021 · 5c8f160 · 5c8f160
1 parent 476b0db
commit 5c8f160
Show file tree

Hide file tree

Showing 2 changed files with 7 additions and 7 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -1,11 +1,11 @@
 cmake_minimum_required(VERSION 3.10)
 project(Vulkan_FFT)
-#set(CMAKE_CONFIGURATION_TYPES "Release" CACHE STRING "" FORCE)
-#set(CMAKE_BUILD_TYPE "Release" CACHE STRING "" FORCE)
+set(CMAKE_CONFIGURATION_TYPES "Release" CACHE STRING "" FORCE)
+set(CMAKE_BUILD_TYPE "Release" CACHE STRING "" FORCE)
 set(VKFFT_BACKEND 0 CACHE STRING "0 - Vulkan, 1 - CUDA, 2 - HIP, 3 - OpenCL")
 
 if(${VKFFT_BACKEND} EQUAL 1)
-	option(build_VkFFT_cuFFT_benchmark "Build VkFFT cuFFT benchmark" OFF)
+	option(build_VkFFT_cuFFT_benchmark "Build VkFFT cuFFT benchmark" ON)
 else()
 	option(build_VkFFT_cuFFT_benchmark "Build VkFFT cuFFT benchmark" OFF)
 endif()
@@ -16,7 +16,7 @@ else()
 	option(build_VkFFT_rocFFT_benchmark "Build VkFFT rocFFT benchmark" OFF)
 endif()
 
-option(build_VkFFT_FFTW_precision "Build VkFFT FFTW precision comparison" ON)
+option(build_VkFFT_FFTW_precision "Build VkFFT FFTW precision comparison" OFF)
 if (MSVC)
 	set_property(DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} PROPERTY VS_STARTUP_PROJECT ${PROJECT_NAME})
 	add_definitions(-D_CRT_SECURE_NO_WARNINGS)
@@ -142,8 +142,8 @@ endif()
 
 if(build_VkFFT_FFTW_precision)
 	add_definitions(-DUSE_FFTW)
-	set(FFTW3_LIB_DIR "C:/Users/dtolm/Documents/FFTW")
-	set(FFTW3_INCLUDE_DIR "C:/Users/dtolm/Documents/FFTW")
+	set(FFTW3_LIB_DIR "/usr/lib/x86_64-linux-gnu/")
+	set(FFTW3_INCLUDE_DIR "/usr/include/")
 	find_library(
 		FFTW_LIB
 		NAMES "libfftw3-3" "fftw3"

diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@ VkFFT is an efficient GPU-accelerated multidimensional Fast Fourier Transform li
   - Forward and inverse directions of FFT
   - Support for big FFT dimension sizes. Current limits: C2C or even C2R/R2C - (2^32, 2^32, 2^32).  Odd C2R/R2C - (2^12, 2^32, 2^32). R2R - (2^12, 2^12, 2^12). Depends on the amount of shared memory on device. (will be increased later).
   - Radix-2/3/4/5/7/8/11/13 FFT. Sequences using radix 3, 5, 7, 11 and 13 have comparable performance to that of powers of 2.
-  - Bluestein FFT algorithm for all other sequences. Full coverage of C2C range, single upload (2^12, 2^12, 2^12) for R2C/C2R/R2R. Optimized to have as few memory transfers as possible by using zeropadding and merged convolution support of VkFFT
+  - Bluestein's FFT algorithm for all other sequences. Full coverage of C2C range, single upload (2^12, 2^12, 2^12) for R2C/C2R/R2R. Optimized to have as few memory transfers as possible by using zeropadding and merged convolution support of VkFFT
   - Single, double and half precision support. Double precision uses CPU generated LUT tables. Half precision still does all computations in single and only uses half precision to store data.
   - All transformations are performed in-place with no performance loss. Out-of-place transforms are supported by selecting different input/output buffers.
   - No additional transposition uploads. Note: data can be reshuffled after the four step FFT algorithm with additional buffer (for big sequences). Doesn't matter for convolutions - they return to the input ordering (saves memory).