References and Contributors
***************************

The path that leads to the current implementation of ChASE using modern
``C++`` design concepts started back in summer of 2012, when an
initial Matlab prototype of the code was implemented in ``C`` and
parallelized using OpenMP 3.0. Soon after a distributed version of the
code was implemented based on MPI and the `Elemental
<http://libelemental.org/>`__ library. The results of these
implementations lead to a the first publications on the performance of
the library on sequences of eigenvalue problems stemming from
Materials Science applications. In the following three years, a first
basic ``C++`` implementation on shared memory architectures and GPUs
was developed. Eventually all these efforts lead to the current
implementation in ``C++`` which is templated and separates algorithm
from kernel implementation through the use of an abstract class.

Approximately one year ago, the codebase underwent a major refactoring
to improve modularity and maintainability. This refactoring effort
reorganized the code structure, enhanced separation of concerns between
different components (sequential vs. parallel implementations, CPU vs.
GPU backends...), and established clearer namespace hierarchies.
These improvements have led to the current well-structured codebase that
supports multiple implementation variants (ChASECPU, ChASEGPU, pChASECPU,
pChASEGPU) with a unified interface while maintaining high performance and
extensibility.

Main developers
================

  * Edoardo Di Napoli -- Algorithm design and development

  * Xinzhe Wu -- Algorithm development, advanced parallel (MPI and GPU) implementation and optimization, developer documentation

  * Clément Richefort -- Algorithm development, advanced parallel (MPI and GPU) implementation and optimization, Pseudo-Hermitian project support, Integration of ChASE into YAMBO code.

Current Contributors
=====================

  
Past Contributors
===================

  * Davor Davidović -- Advanced parallel GPU implementation and optimization

  * Nenad Mijić -- ARM-based implementation and optimization 
  
  * Xiao Zhang -- Integration of ChASE into Jena BSE code

  * Miriam Hinzen, Daniel Wortmann -- Integration of ChASE into FLEUR code

  * Sebastian Achilles -- Library benchmarking on parallel platforms, documentation

  * Jan Winkelmann -- DoS algorithm development and advanced ``C++`` implementation

  * Paul Springer -- Advanced GPU implementation

  * Marija Kranjcevic -- OpenMP ``C++`` implementation

  * Josip Zubrinic -- Early GPU algorithm development and implementation

  * Jens Rene Suckert -- Lanczos algorithm and GPU implementation

  * Mario Berljafa -- Early ``C`` and ``MPI`` implementation using the Elemental library

How to Reference the Code
==========================
The main reference of ChASE is [1] while [2] provides some early
results on scalability and usage on sequences of eigenproblems
generated by Materials Science applications. Reference [4] describes
the recent advances in distributed multi-GPU implementation using NCCL
and algorithm optimizations.

  * [1] J. Winkelmann, P. Springer, and E. Di Napoli. *ChASE: a
    Chebyshev Accelerated Subspace iteration Eigensolver for sequences
    of Hermitian eigenvalue problems.* ACM Transaction on Mathematical
    Software, **45** Num.2, Art.21, (2019). `DOI:10.1145/3313828
    <https://doi.org/10.1145/3313828>`__ , [`arXiv:1805.10121
    <https://arxiv.org/abs/1805.10121/>`__ ]

  * [2] M. Berljafa, D. Wortmann, and E. Di Napoli. *An Optimized and
    Scalable Eigensolver for Sequences of Eigenvalue Problems.*
    Concurrency & Computation: Practice and Experience **27** (2015),
    pp. 905-922. `DOI:10.1002/cpe.3394
    <https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.3394>`__ , [`arXiv:1404.4161
    <https://arxiv.org/abs/1404.4161>`__ ].

  * [3] X. Wu, D. Davidović, S. Achilles,E. Di Napoli. *ChASE: a distributed hybrid 
    CPU-GPU eigensolver for large-scale hermitian eigenvalue problems.* Proceedings 
    of the Platform for Advanced Scientific Computing Conference (PASC22). 
    `DOI:10.1145/3539781.3539792 <https://doi.org/10.1145/3539781.3539792>`__ , [`arXiv:2205.02491
    <https://arxiv.org/pdf/2205.02491/>`__ ]

  * [4] X. Wu and E. Di Napoli. *Advancing the distributed Multi-GPU ChASE library 
    through algorithm optimization and NCCL library.* Proceedings of the SC'23 
    Workshops of the International Conference on High Performance Computing, 
    Network, Storage, and Analysis (2023), pp. 1688-1696. 
    `DOI:10.1145/3624062.3624249 <https://doi.org/10.1145/3624062.3624249>`__ , 
    [`arXiv:2309.15595 <https://arxiv.org/abs/2309.15595>`__ ]