References and Contributors *************************** The path that leads to the current implementation of ChASE using modern ``C++`` design concepts started back in summer of 2012, when an initial Matlab prototype of the code was implemented in ``C`` and parallelized using OpenMP 3.0. Soon after a distributed version of the code was implemented based on MPI and the `Elemental `__ library. The results of these implementations lead to a the first publications on the performance of the library on sequences of eigenvalue problems stemming from Materials Science applications. In the following three years, a first basic ``C++`` implementation on shared memory architectures and GPUs was developed. Eventually all these efforts lead to the current implementation in ``C++`` which is templated and separates algorithm from kernel implementation through the use of an abstract class. Approximately one year ago, the codebase underwent a major refactoring to improve modularity and maintainability. This refactoring effort reorganized the code structure, enhanced separation of concerns between different components (sequential vs. parallel implementations, CPU vs. GPU backends...), and established clearer namespace hierarchies. These improvements have led to the current well-structured codebase that supports multiple implementation variants (ChASECPU, ChASEGPU, pChASECPU, pChASEGPU) with a unified interface while maintaining high performance and extensibility. Main developers ================ * Edoardo Di Napoli -- Algorithm design and development * Xinzhe Wu -- Algorithm development, advanced parallel (MPI and GPU) implementation and optimization, developer documentation * Clément Richefort -- Algorithm development, advanced parallel (MPI and GPU) implementation and optimization, Pseudo-Hermitian project support, Integration of ChASE into YAMBO code. Current Contributors ===================== Past Contributors =================== * Davor Davidović -- Advanced parallel GPU implementation and optimization * Nenad Mijić -- ARM-based implementation and optimization * Xiao Zhang -- Integration of ChASE into Jena BSE code * Miriam Hinzen, Daniel Wortmann -- Integration of ChASE into FLEUR code * Sebastian Achilles -- Library benchmarking on parallel platforms, documentation * Jan Winkelmann -- DoS algorithm development and advanced ``C++`` implementation * Paul Springer -- Advanced GPU implementation * Marija Kranjcevic -- OpenMP ``C++`` implementation * Josip Zubrinic -- Early GPU algorithm development and implementation * Jens Rene Suckert -- Lanczos algorithm and GPU implementation * Mario Berljafa -- Early ``C`` and ``MPI`` implementation using the Elemental library How to Reference the Code ========================== The main reference of ChASE is [1] while [2] provides some early results on scalability and usage on sequences of eigenproblems generated by Materials Science applications. Reference [4] describes the recent advances in distributed multi-GPU implementation using NCCL and algorithm optimizations. * [1] J. Winkelmann, P. Springer, and E. Di Napoli. *ChASE: a Chebyshev Accelerated Subspace iteration Eigensolver for sequences of Hermitian eigenvalue problems.* ACM Transaction on Mathematical Software, **45** Num.2, Art.21, (2019). `DOI:10.1145/3313828 `__ , [`arXiv:1805.10121 `__ ] * [2] M. Berljafa, D. Wortmann, and E. Di Napoli. *An Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems.* Concurrency & Computation: Practice and Experience **27** (2015), pp. 905-922. `DOI:10.1002/cpe.3394 `__ , [`arXiv:1404.4161 `__ ]. * [3] X. Wu, D. Davidović, S. Achilles,E. Di Napoli. *ChASE: a distributed hybrid CPU-GPU eigensolver for large-scale hermitian eigenvalue problems.* Proceedings of the Platform for Advanced Scientific Computing Conference (PASC22). `DOI:10.1145/3539781.3539792 `__ , [`arXiv:2205.02491 `__ ] * [4] X. Wu and E. Di Napoli. *Advancing the distributed Multi-GPU ChASE library through algorithm optimization and NCCL library.* Proceedings of the SC'23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis (2023), pp. 1688-1696. `DOI:10.1145/3624062.3624249 `__ , [`arXiv:2309.15595 `__ ]