Releases: ChASE-library/ChASE
ChASE code structure fully revised
ChASE has gone a complex restructuring process that included major changes in the hierarchical folder structure, a clearer division between parallel implementations and a new Test-Driven-Development which introduced unit testing for many of its routines. In particular:
- A clear division between Chase algorithm and their implementation has been maintained
- The name of the base virtual class has changed from Chase to ChaseBase
- All Implementation for sequential and parallel CPU/GPU architecture has been clearly separated and based on the new ChaseBase class.
- All reliance on external numerical libraries (BLAS, LAPACK, etc.) has been separated and templated
- Utilities for the initialization of pure MPI, cuda-aware MPI and NCCL grids has been redefined and clearly separated
- Implementation of the kernels have been grouped in a
linalg
folder which contains- matrix classes for shared-memory CPU and GPU architectures
- matrix classes for distinct distribution of matrices and vectors
- shared memory kernels
- distributed CPU kernels
- single GPU kernels
- distributed GPU kernels
- Functionalities for mixed precision have been added
- Unit testing has been introduced for
- the grid functionalities
- matrix type functionalities
- all kernels for the separate cases (CPU, GPU, distributed CPU, etc.)
- Examples have been generalized and redistribution of matrices has been abstracted
- A template for Collaboration Agreement (CLA) has been added
Added CI pipeline for automatic building and testing
Created a CI pipeline and included unit and integration testing for the QR decomposition.
Bug fix: GPU-timing syncronization
A problem was observed with different NVTX ranges on CPU and GPU. The problem has been solved by explicitly synchronizing the CPU with the GPU.
ChASE v1.4.0. Major release.
Introduced a new distributed GPU-build of ChASE entirely based on the NVIDIA NCCL library, which avoids the explicit data movement between host and device memory, and leads to much faster collective communications among the involved GPUs. This new release achieves between a 1.5x and 3x with respect to the traditional distributed multi-GPUs build. Now ChASE can be compiled and executed with the following distinct parallel configurations:
Distributed CPU only
Distributed multi-GPUs (traditionally based on host-device communication standards)
Distributed multi-GPUs (using NVIDIA NCCL library)
ChASE v1.3.1: minor release
Updated the estimation bound for the condition number of the matrix of filtered vectors V
. This estimate bounds from above the actual condition number of the matrix V
allowing for the dynamical selection of the Communication-Avoiding QR-decomposition (CAQR) variant within the ChASE library at run time.
ChASE v1.3.0. Major release.
This release features a number of changes in the parallel implementation and the algorithm.
- The QR factorization, which was previously done redundantly on each MPI process, is not parallelized on a 1D sub-grid of the 2D MPI cartesian grid.
- As a consequence of the additional parallelization, the number and structure of the workspace buffers has changed greatly diminishing the memory footprint of the entire library
- The use of the postApplication function has been substituted with the result that some of the communication is now hidden behind computation during the execution of the Rayleigh-Ritz kernel and the Residual kernel
- The parallel HouseholderQR algorithm has been substituted with the CholeskyQR algorithm (and its more stable variants). A mechanism to avoid failure of this algorithm has been introduced based on numerical analysis results.
- A new parallel random generator has been added to reduce the time spent initializing the computation, especially for large scale problems.
ChASE is now integrated into the ELSI library.
In this release:
- The C and Fortran interfaces have been improved
- Dependencies on Nvtx tool has been removed
- the ELSI interface has been included
ChASE v1.2.0 Release
We release the version 1.2.0 of ChASE, with new features as follows:
-
include fortran interface explicitly in the ChASE code
-
add a new chase-mpi-properties interface for block distribution, in which the distribution is provided by user, rather than use the built-in one.
-
fully compatible with Quantum Espresso
Standardized LICENSE
Fixes
- BSD3.0 license standardization
Algorithmic improvements in the Chebyshev filter
Improvements
- Integrated axpy in the call to HEMM when executing the 3-terms recurrence relation in the Chebyshev filter.
- Moved the shift of the A matrix in the 3-terms recurrent relation for the GPU build within the accelerator.