************************************ Modules ************************************ Overview ============= The implementation of ChASE provides a stand-alone high-performance parallel library based on our original design of the Chebyshev accelerated subspace iteration algorithm. The ChASE library promises the portability to heterogeneous architectures and the easy integration into existing codes. This goal is achieved by separating the implementation of the ChASE algorithm from the required numerical kernels via an interface based on pure C++ abstract classes. Classes derived from this interface handle data distribution and (parallel) execution of each kernel. The required numerical kernels are based on Basic Linear Algebra Subprograms (BLAS)-3 compatible kernels, such as a (parallel) matrix-matrix multiplication and QR factorization. This modern “stand-alone” strategy grants ChASE an unprecedented degree of flexibility that makes the integration of this library in most application codes quite simple. ChASE efficiently uses available machine resources. We give an UML class diagram as follows. This diagram uses the implementation of Chebyshev `Filter`, whose kernel is a series of Hermitian Matrix-Matrix Products (*HEMM*), as an example, to show the scheme of implementing of ChASE and porting to different architectures. This section will give the user an insight on how to set up their eigenproblems and solve them by ChASE on different computing architectures. .. image:: /images/ChASE_UML.jpg :scale: 99 % :align: center As shown in the diagram above, all the implementation of ChASE take place within a C++ namespace ``chase``. Basic Classes ================ chase::Chase -------------- The numerical kernels required by ChASE algorithm are defined in class ``chase::Chase``. All the functions are defined as virtual functions, and further implementations are required targeting different computing architectures. It includes the following functionalities: * ``HEMM``: Hermitian Matrix-Matrix Multiplication * ``QR``: QR factorization * ``RR``: Rayleigh-Ritz projection and small problem solver * ``Resd``: compute the eigenpair residuals * ``Lanczos``: estimate the bounds of user-interested spectrum by Lanczos eigensolver * ``LanczosDos``: estimate the spectral distribution of eigenvalues * ``Swap``: swap the two matrices of vectors used in the Chebyschev filter * ``Locking``: locking the converged eigenpairs * ``Shift``: shift the diagonal of matrix ``A`` used in the 3-terms recurrence relation implemented in the Chebyschev filter * etc .. .. note:: For more details on the virtual kernels, please refer to :ref:`virtual_abstrac_numerical_kernels`. Different parallel implementations of these virtual kernels can also be found :ref:`parallel_implementations`. chase::Algorithm ------------------ The class ``chase::Algorithm`` has the awareness of the class ``chase::Chase``, and it defines algorithmic implementation of ChASE using the defined virtual kernels in ``chase::Chase``. It includes the functionalities: * Chebyshev filter * calculation of degree of the filter * Lanczos solver to estimate the bound of spectra * locking the converged Ritz pairs * etc .. The function ``chase::Solve`` provides the implementation of ChASE algorithm by assembling the algorithms and numerical kernels implemented in ``chase::Chase`` and ``chase::Algorithm``. .. note:: This class implements the ChASE algorithm by the virtual functions, it cannot run in practice until the further implementations of these virtual functions are provided. .. note:: The details of this class are only provided in the developer documentation, please refer to :ref:`algorithmic-structure`. chase::ChaseConfig -------------------- The class ``chase::Algorithm`` is aware of the class ``chase::ChaseConfig``, which defines the functions to set different parameters of ChASE. Besides setting up the standard parameters such as size of the matrix defining the eigenproblem, number of wanted eigenvalues, the public functions of this class initialize all internal parameters and allow the experienced user to set up the values of parameters of core functionalities (e.g. lanczos DoS). The aim is to influence the behavior of the library in special cases when the default values of the parameters return a sub-optimal efficiency in terms of performance and/or accuracy. .. note:: For more details of all available functions, please refer to :ref:`configuration_object`. chase::ChasePerfData --------------------------- This class defines the performance data for different algorithm and numerical kernels of ChASE, e.g., the floating operations of ChASE for given size of matrix and a required number of eigenpairs to be computed. The ``chase::ChasePerfData`` class collects and handles information relative to the execution of the eigensolver. It collects information about - Number of subspace iterations - Number of filtered vectors - Timings of each main algorithmic procedure (Lanczos, Filter, etc.) - Number of FLOPs executed The number of iterations and filtered vectors can be used to monitor the behavior of the algorithm as it attempts to converge all the desired eigenpairs. The timings and number of FLOPs are use to measure performance, especially parallel performance. The timings are stored in a vector of objects derived by the class template `std::chrono::duration`. .. note:: For more details of all available functions, please refer to :ref:`performance`. chase::PerformanceDecoratorChase ------------------------------------- This is a class derived from the ``chase::Chase`` which plays the role of interface for the kernels used by the library. All members of the ``chase::Chase`` class are virtual functions. These functions are re-implemented in the ``chase::PerformanceDecoratorChase`` class. All derived members that provide an interface to computational kernels are re-implemented by *decorating* the original function with time pointers which are members of the ``chase::ChasePerfData`` class. All derived members that provide an interface to input or output data are called without any specific decoration. In addition to the virtual member of the ``chase::Chase`` class, the ``chase::PerformanceDecoratorChase`` class has also among its public members a reference to an object of type ``chase::ChasePerfData``. When using Chase to solve an eigenvalue problem, the members of the PerformanceDecoratorChase are called instead of the virtual functions members of the ``chase::Chase`` class. In this way, all parameters and counters are automatically invoked and returned in the correct order. .. note:: For more details of all available functions, please refer to :ref:`performance`. Override of Virtual Functions ================================ The exact implementation of numerical kernels used by ChASE are within the namespace ``chase::mpi``. This namespace is defined inside the namespace ``chase``, which provides the parallel implementation of ChASE based on ``MPI`` (and ``CUDA``) by re-implementing the numerical kernels as virtual functions within the abstraction targeting homogeneous and heterogeneous architectures (multi-GPUs). chase::mpi::ChaseMpiMatrices ------------------------------ The class ``chase::mpi::ChaseMpiMatrices`` defines the allocation of buffers for matrices and vectors in ChASE library for both non-MPI mode and MPI mode. .. note:: For more details of all available functions, please refer to :ref:`ChaseMpiMatrices`. chase::mpi::ChaseMpiProperties -------------------------------- The class ``chase::mpi::ChaseMpiProperties`` defines the construction of MPI environment and data distribution scheme (both **Block Distribution** and **Block-Cyclic Distribution**) for ChASE. This class has the awareness of the class ``chase::mpi::ChaseMpiMatrices``. It will allocate the required buffer based on the configuration MPI environment and data distribution by using different constructors of ``chase::mpi::ChaseMpiMatrices``. .. note:: For more details of all available functions, please refer to :ref:`ChaseMpiProperties`. chase::mpi::ChaseMpi ---------------------- ``chase::mpi::ChaseMpi`` is a derived class of ``chase::Chase``. This class gives an implementation of the virtual functions of ``chase::Chase`` class which defines the essential numerical kernels of ChASE algorithm. It is a templated class with two types required: an implementation of ``chase::mpi::ChaseMpiDLAInterface`` and the scalar type to be used in the applications. The numerical kernels defined in ``chase::Chase`` has further decoupled into Dense Linear Algebra operations (DLAs). Different objects of ``chase::mpi::ChaseMpi`` can be created targeting different computing platforms by selecting various derived classes of ``chase::mpi::ChaseMpiDLAInterface``. To be more precise, it is derived from the ``chase::Chase`` class which plays the role of interface for the kernels used by the library: - All members of the ``chase::Chase`` class are virtual functions. These functions are re-implemented in the ``chase::mpi::ChaseMpi`` class. - All the members functions of ``chase::mpi::ChaseMpi``, which are the implementation of the virtual functions in class ``chase::Chase``, are implemented using the *DLA* routines provided by the class ``chase::mpi::ChaseMpiDLAInterface``. - The DLA functions in ``chase::mpi::ChaseMpiDLAInterface`` are also virtual functions, which are differently implemented targeting different computing architectures (sequential/parallel, CPU/GPU, shared-memory/distributed-memory, etc). In the class ``chase::mpi::ChaseMpi``, the calling of DLA functions are indeed calling their implementations from different derived classes. Thus this ChaseMpi class is able to have customized implementation for various architectures. - The class ``chase::mpi::ChaseMpi`` has the awareness of the class ``chase::mpi::ChaseMpiMatrices`` and ``chase::mpi::ChaseMpiProperties``. - For the shared-memory implementation, the constructor of ``chase::mpi::ChaseMpi`` takes an instance of ``chase::mpi::ChaseMpiMatrices`` as input - For the distributed-memory implementation of the class ``chase::mpi::ChaseMpi``, the setup of MPI environment and communication scheme, and the distribution of data (matrix, vectors) across MPI nodes are following the ``chase::mpi::ChaseMpiProperties`` class, the distribution of matrix can be either **Block** or **Block-Cyclic** scheme. The required buffers are allocated during the construction of an object of ``chase::mpi::ChaseMpiProperties``. .. note:: For more details of all available functions, please refer to :ref:`ChaseMpi`. DLAs for shared-memory architectures -------------------------------------- The DLAs for shared-memory architectures with or without GPU are implemented in the classes ``chase::mpi::ChaseMpiDLACudaSeq`` and ``chase::mpi::ChaseMpiDLABlaslapackSeq``, respectively. .. toctree:: :maxdepth: 3 module/chasempidlaseq module/chasempidlacudaseq DLAs for distributed-memory architectures ------------------------------------------- For the implementation of DLAs for distributed-memory architectures, they have been further decoupled into two layers: - the first layer is for the collective communication between different computing nodes - the second layer is for the implementation of local computation within each node - for homogeneous systems with CPUs-only, the local computation takes place on each individual MPI processor, with potential parallelization of multi-threading, e.g., with OpenMP. - for the heterogeneous systems with GPUs, some local computation takes place on each individual MPI processor, and more intensive computation are offloaded to each GPU bound to relevant MPI processor. The local computations with or without GPUs are implemented in the classes ``chase::mpi::ChaseMpiDLABlaslapack`` and ``chase::mpi::ChaseMpiDLAMultiGPU``, respectively. The collective communication layer is shared between the distributed memory ChASE with or without GPU support, which is implemented in the class ``chase::mpi::chaseMpiDLA``. This class takes an instance of ``chase::mpi::ChaseMpiDLAInterface``, either ``chase::mpi::ChaseMpiDLABlaslapack`` or ``chase::mpi::ChaseMpiDLAMultiGPU`` as input. In this way, it is able to access to different implementations of local computation kernels. .. note:: When an instance of ``chase::mpi::ChaseMpi`` is constructed for distributed-memory systems, one of its template parameter should be provided either ``chase::mpi::ChaseMpiDLABlaslapack`` and ``chase::mpi::ChaseMpiDLAMultiGPU``. Then a instance of the class ``chase::mpi::ChaseMpiDLA`` will the created with the selected implementations of local computations kernels. In this way, ChASE is able to be ported to different computation architectures. .. toctree:: :maxdepth: 3 module/chasempidlaImpl module/chasempidlablaslapack module/chasempidlamultigpu