- updated ROCm support to 5.0 and higher
- dropped support for CUDA 9 and maybe 10
- tested on CUDA 11.4 - 12.1
- changed the name of the main development branch to
develop
- CMake now requires version 3.19
- updated the way the CUDA libraries are found
- added a full example for the VEX module
- fixed some conflicts when using regtype::none and float or double arithmetic
- adjusted scaling on some tests to account for order of operations when using avx512
- included full support for the BLAS standard
- included some sparse support for row-compresses matrices
- using gemv, gemm, trsv, and trsm operations
- included ilu preconditioner
- included custom iterative solvers
- conjugate-gradient with single rhs and batched
- general minimum residual GMRES
- the above is supported for CPU, CUDA and ROCM
- added support for cholmod direct solver (double precision only)
- support for some LAPACK methods (CPU only)
- work will continue on the LAPACK front
- added support for extended registers
- sse3 using 128-bit registers
- avx using 256-bit registers
- avx512 using 512-bit registers
- support includes operation for real and complex numbers
- support includes operation for single and double precision