feat(gpu): add circulant matrix for one vs many poly product #2030

andrei-stoian-zama · 2025-02-05T14:34:22Z

To support encrypted GLWE x clear matrix product, all polys of the GLWE are multiplied with the clear matrix. For each poly of the GLWE, this PR builds a circulant matrix that is multiplied with the clear matrix to obtain the polynomial product. Sample N of this product contains the matrix product of the original clear vector that was encrypted and the clear matrix.

Now, with make test_core_crypto_gpu there are two new tests:

test core_crypto::gpu::algorithms::test::glwe_dot_product_with_clear::test_gpu_glwe_dot_product_with_clear_test_params_4_bits_native_u64 ... ok

test core_crypto::gpu::algorithms::test::glwe_dot_product_with_clear::test_gpu_poly_product_with_clear_test_params_4_bits_native_u64 ... ok

backends/tfhe-cuda-backend/cuda/src/linearalgebra/multiplication.cuh

andrei-stoian-zama · 2025-02-26T12:38:01Z

tfhe/src/core_crypto/gpu/algorithms/test/glwe_dot_product_with_clear.rs

+    decoded_value
+}
+
+fn glwe_dot_product_with_clear<Scalar: UnsignedTorus + CastFrom<usize>>(


this test checks the poly product without noise

andrei-stoian-zama · 2025-02-26T12:56:24Z

backends/tfhe-cuda-backend/cuda/src/linearalgebra/multiplication.cuh

+// to any matrix dimension
+template <typename Torus, typename TorusVec>
+__global__ void tgemm(int M, int N, int K, const Torus *A, const Torus *B,
+                      int stride_B, Torus *C, int stride_C) {


I added a stride_C parameter, since I use this function to write the output matrix in a bigger buffer (a GLWE list) that has a bigger stride than the width of C which is N.

andrei-stoian-zama · 2025-02-26T12:58:36Z

backends/tfhe-cuda-backend/cuda/src/polynomial/polynomial_math.cuh

+// values into their new positions. The elements above the diagonal
+// are multiplied by -1
+template <typename Torus>
+__global__ void polynomial_make_circulant(Torus *result, const Torus *poly,


here is the python equivalent of the algorithm:
with bi=blockIdx.x, bj=blockIdx.y, ti=threadIdx.x,tj=threadIdx.y

def make_circulant_transpose_cuda(v1): N = v1.shape[0] result = np.zeros((N * N,), dtype=np.uint64) BS = 4 for bi in range(0, N // BS): for bj in range(0, N // BS): buf = np.zeros((2 * BS - 1), dtype=np.uint64) block_start = bi * BS * N + bj * BS for ti in range(BS): for tj in range(BS): tid = ti * BS + tj if tid < 2 * BS - 1: read_idx_start = (bj - bi) * BS + tid - BS + 1 if read_idx_start < 0: read_idx_start = N + read_idx_start buf[tid] = v1[read_idx_start] # Sync threads for ti in range(BS): for tj in range(BS): fact = 1 if bi * BS + ti > bj * BS + tj: fact = -1 result[block_start + ti * N + tj] = buf[tj - ti + BS - 1] * fact return result.reshape((N, N))

andrei-stoian-zama · 2025-02-26T13:00:23Z

backends/tfhe-cuda-backend/cuda/src/polynomial/polynomial_math.cuh

+
+  int32_t tid = threadIdx.x * CIRCULANT_BLOCKTILE + threadIdx.y;
+
+  if (tid < 2 * CIRCULANT_BLOCKTILE - 1) {


only the 2 rows of threads in the block read data since we only need to read 2*block_tile-1 values for a block of block_tile x block_tile threads

andrei-stoian-zama · 2025-02-27T14:15:53Z

tfhe/src/core_crypto/gpu/algorithms/test/glwe_dot_product_with_clear.rs

+) {
+    let mut rng = rand::thread_rng();
+
+    let poly_size = 2 << rng.gen_range(8usize..12);


sometimes check with n_polys=poly_size, sometimes check with arbitrary number of polys

cla-bot bot added the cla-signed label Feb 5, 2025

andrei-stoian-zama force-pushed the as/add_circulant_poly_product branch 3 times, most recently from 5a2a836 to 022856b Compare February 26, 2025 12:36

andrei-stoian-zama commented Feb 26, 2025

View reviewed changes

backends/tfhe-cuda-backend/cuda/src/linearalgebra/multiplication.cuh Show resolved Hide resolved

andrei-stoian-zama commented Feb 26, 2025

View reviewed changes

andrei-stoian-zama force-pushed the as/add_circulant_poly_product branch from 022856b to f177e42 Compare February 26, 2025 12:55

andrei-stoian-zama commented Feb 26, 2025

View reviewed changes

andrei-stoian-zama marked this pull request as ready for review February 26, 2025 13:01

andrei-stoian-zama requested review from pdroalves and agnesLeroy February 26, 2025 13:02

andrei-stoian-zama force-pushed the as/add_circulant_poly_product branch 2 times, most recently from 7100709 to 4694773 Compare February 26, 2025 13:22

andrei-stoian-zama marked this pull request as draft February 27, 2025 10:15

andrei-stoian-zama force-pushed the as/add_circulant_poly_product branch from 4694773 to f99d91b Compare February 27, 2025 14:14

andrei-stoian-zama commented Feb 27, 2025

View reviewed changes

andrei-stoian-zama marked this pull request as ready for review February 27, 2025 14:16

feat(gpu): add poly product with circulant matrix

cec3d0e

andrei-stoian-zama force-pushed the as/add_circulant_poly_product branch from f99d91b to cec3d0e Compare March 3, 2025 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gpu): add circulant matrix for one vs many poly product #2030

feat(gpu): add circulant matrix for one vs many poly product #2030

andrei-stoian-zama commented Feb 5, 2025 •

edited

Loading

andrei-stoian-zama Feb 26, 2025

andrei-stoian-zama Feb 26, 2025

andrei-stoian-zama Feb 26, 2025 •

edited

Loading

andrei-stoian-zama Feb 26, 2025

andrei-stoian-zama Feb 27, 2025


		int32_t tid = threadIdx.x * CIRCULANT_BLOCKTILE + threadIdx.y;

		if (tid < 2 * CIRCULANT_BLOCKTILE - 1) {

feat(gpu): add circulant matrix for one vs many poly product #2030

Are you sure you want to change the base?

feat(gpu): add circulant matrix for one vs many poly product #2030

Conversation

andrei-stoian-zama commented Feb 5, 2025 • edited Loading

andrei-stoian-zama Feb 26, 2025

Choose a reason for hiding this comment

andrei-stoian-zama Feb 26, 2025

Choose a reason for hiding this comment

andrei-stoian-zama Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

andrei-stoian-zama Feb 26, 2025

Choose a reason for hiding this comment

andrei-stoian-zama Feb 27, 2025

Choose a reason for hiding this comment

andrei-stoian-zama commented Feb 5, 2025 •

edited

Loading

andrei-stoian-zama Feb 26, 2025 •

edited

Loading