3. How to use ChASE

3.1. Use ChASE as a standalone solver

ChASE has multiple versions for both shared-memory and distributed-memory systems, with or without GPU supports. This section helps users use ChASE to solve their own problem on preferred architectures from scratch.

In order to use ChASE, the first header file should be included is ChASE-MPI/chase_mpi.hpp, which is a common interface of ChASE solver. This header provides multiple constructors of class ChaseMpi, targeting different computing architectures.

Note

It is named as ChASE-MPI, but for current release version of ChASE, it should be included no matter shared-memory or distributed-memory versions of ChASE is selected to be used.

Note

For all versions of ChASE, they share a same interface for the solving step, parameter configuration and performance decoration.

3.1.1. Shared-Memory ChASE

3.1.1.1. Include headers

The shared-memory version of ChASE can be built with or without the support of Nvidia GPU. If GPU support is enabled, only 1 GPU card would be used.

  • In order to use shared-memory version of ChASE with only CPU support, it is also necessary to include header ChASE-MPI/impl/chase_mpidla_blaslapack_seq.hpp. This header file provides an implementation of the templated class ChaseMpiDLABlaslapackSeq which provides the implementations of required dense linear algebra operations. Its template type determines the scalar type that the user would like to work with.

  • In order to use shared-memory version of ChASE with GPU support, another header file ChASE-MPI/impl/chase_mpidla_cuda_seq.hpp should be included. This header file provides an implementation of the templated class ChaseMpiDLACudaSeq which provides the implementations of required dense linear algebra operations. Its template

type determines the scalar type that the user would like to work with.

3.1.1.2. ChASE solver

A ChASE solver, which is in fact an instance of ChaseMpi should be constructed, before solving the eigenprolem. ChaseMpi is templated with 2 classes: template <typename> class MF and class T. The T is to determine the scalar type that the user would like to work with. For shared-memory ChASE template <typename> class MF should be either ChaseMpiDLACudaSeq<T> or ChaseMpiDLABlaslapackSeq<T>.

With the combination of the templates MF and T, different instances of ChaseMpi can be constructed targeting different architectures and scalar types.

For examples, if the user wants to use ChASE to solve an Hermitian matrix with double precision on GPU, an instance of ChaseMpi should be constructed as follows

//N: global size of matrix to be diagonalized
//nev: number of eigenpairs to be computed
//nex: external searching space size
//buffer for storing eigevectors
auto V = std::vector<std::complex<double>>(N * (nev + nex));
//buffer for storing computed ritz values
auto Lambda = std::vector<double>(nev + nex);
//buffer for storing Hermitian matrix to be diagonalized
std::vector<std::complex<double>> H(N * N);

ChaseMpi<ChaseMpiDLACudaSeq, std::complex<double>> solver(N, nev, nex, V.data(), Lambda.data(), H.data());

For the details of APIs, please visit ChaseMpi.

3.1.2. Distributed-Memory ChASE

3.1.2.1. Include headers

The distributed-memory version of ChASE can be built with or without the support for Nvidia GPUs. If GPU support is enabled, it supports only 1 GPU per MPI rank.

  • CPU version: it is also necessary to include header ChASE-MPI/impl/chase_mpidla_blaslapack.hpp. This header file provides an implementation of a templated class ChaseMpiDLABlaslapack which provides the implementations of required dense linear algebra operations. Its template type determines the scalar type that the user would like to work with.

  • GPU version: another header file ChASE-MPI/impl/chase_mpidla_mgpu.hpp should be included. This header file provides an implementation of a templated class ChaseMpiDLAMultiGPU which provides the implementations of required dense linear algebra operations. Its template type determines the scalar type that the user would like to work with.

3.1.2.2. MPI working context

Unlike shared-memory ChASE, for distributed-memory ChASE, it is necessary to initialize a working MPI communicator for it. A class ChaseMpiProperties is designed which is able to construct a 2D MPI grid environment based on user’s configuration. Multiple constructors of this class are available:

  • a constructor for Block Distribution with user-customized 2D MPI grid

  • a constructor for Block Distribution with 2D MPI grid determined internally by ChASE (as square as possible)

  • a constructor for Block-Cyclic Distribution with user-customized 2D MPI grid

Note

Apart from the setup of 2D MPI grid, this class allocates also the temporary buffers for ChASE and provides some utilities for facilitating the communications.

An example for the constructor for Block Distribution with 2D MPI grid determined internally is given as follows

auto props = new ChaseMpiProperties<std::complex<double>>(N, nev, nex, MPI_COMM_WORLD);

in which the input arguments are for: global matrix size, number of eigenpairs to compute, external searching space size, and working MPI communicator, respectively. A 2D MPI grid will be internally by ChASE which is as square as possible.

An example for the constructor for Block Distribution with user-customized 2D MPI grid is given as follows

auto props = new ChaseMpiProperties<T>(N, nev, nex, m, n, dims[0], dims[1], (char*)"C", MPI_COMM_WORLD);

in which the input arguments are for: global matrix size, number of eigenpairs to compute, external searching space size, the row number of local block of matrix, the column number of local block of matrix, row number of 2D MPI grid, column number of 2D MPI grid, the grid major of 2D MPI grid (‘C’ refers to column major), and working MPI communicator, respectively.

An example for the constructor for Block-Cyclic Distribution with user-customized 2D MPI grid is given as follows

auto props = new ChaseMpiProperties<T>(N, NB, NB, nev, nex, dims[0], dims[1], (char*)"C", irscr, icsrc, MPI_COMM_WORLD);

in which the input arguments are for: global matrix size, the block factor of block-cyclic distribution for the 1st and 2nd dimension of 2D MPI grid, number of eigenpairs to compute, external searching space size, the row number of local block of matrix, the column number of local block of matrix, row number of 2D MPI grid, column number of 2D MPI grid, the grid major of 2D MPI grid (‘C’ refers to column major), process row/column over which the first row/column of the global matrix is distributed, and working MPI communicator, respectively.

3.1.2.3. ChASE solver

Same as for shared-memory version of ChASE, the class ChaseMpi provides also constructors for the distributed-memory versions:

  • a constructor with pre-allocated buffer H for Hermitian matrix and its leading dimension ldh

  • a constructor without a pre-allocated buffer for Hermitian matrix: in this case, the required buffer would be internally allocated, a memcpy operation is always required to copy from user-provided matrix to the internally allocated buffer. For some historic reasons, this version exists, and we are considering to remove it in near future.

  • Unlike the constructor for shared-memory version of ChASE, the constructors of distributed-memory versions take an instance of ChaseMpiProperties as an input. This allows creating different ChASE solver with either Block Distribution or Block-Cyclic Distribution and user customized MPI configuration.

ChaseMpi is templated with 2 classes: template <typename> class MF and class T. The T is to determine the scalar type that the user would like to work with. For distributed-memory ChASE template <typename> class MF should be either ChaseMpiDLABlaslapack<T> or ChaseMpiDLAMultiGPU<T>.

With the combination of the templates MF and T, different instances of ChaseMpi can be constructed targeting different architectures and scalar types.

For examples, if users want to use ChASE to solve an Hermitian matrix with double precision which is to distribute in a Block-Cyclic fashion onto multi-GPUs, an instance of ChaseMpi should be constructed as follows

//N: global size of matrix to be diagonalized
//nev: number of eigenpairs to be computed
//nex: external searching space size
//NB: block factor for block-cyclic distribution
//dims[0] x dims[1]: 2D MPI grid
//irsrc, icsrc: over which processor row/column the block-cyclic distribution starts from
//construct MPI context with block-cyclic distribution
auto props = new ChaseMpiProperties<std::complex<double>>(N, NB, NB, nev, nex, dims[0], dims[1], (char*)"C", irsrc, icsrc, MPI_COMM_WORLD);
//buffer for storing eigevectors
auto V = std::vector<std::complex<double>>(props->get_m() * (nev + nex));
//buffer for storing computed ritz values
auto Lambda = std::vector<double>(nev + nex);
auto ldh =  props->get_m();
//buffer for storing Hermitian matrix to be diagonalized
std::vector<T> H( ldh *  props->get_n());

ChaseMpi<ChaseMpiDLAMultiGPU, std::complex<double>> solver(props, H, ldh, V.data(), Lambda.data());

Another example with Block Distribution and without pre-allocated buffer for Hermitian matrix is as follows:

//N: global size of matrix to be diagonalized
//nev: number of eigenpairs to be computed
//nex: external searching space size
//construct MPI context with block-cyclic distribution
auto props = new ChaseMpiProperties<std::complex<double>>(N, nev, nex, MPI_COMM_WORLD);
//buffer for storing eigevectors
auto V = std::vector<std::complex<double>>(props->get_m() * (nev + nex));
//buffer for storing computed ritz values
auto Lambda = std::vector<double>(nev + nex);

ChaseMpi<ChaseMpiDLAMultiGPU, std::complex<double>> solver(props, V.data(), Lambda.data());

For the details of APIs, please visit ChaseMpi.

3.1.3. Parameter Configuration

Before the starting of solving step, selected parameters are able to be customized by users. We give an example to show how to configure the parameters for a constructed instance of ChASE solver solver.

/*Setup configure for ChASE*/
auto& config = solver.GetConfig();
/*Tolerance for Eigenpair convergence*/
config.SetTol(1e-10);
/*Initial filtering degree*/
config.SetDeg(20);
/*Optimi(S)e degree*/
config.SetOpt(true);
/*Set max iteration steps*/
config.SetMaxIter(25);

Note

For all the versions of ChASE targeting different computing architectures, they share a uniform interface for the parameter configuration.

For more details about the APIs of parameter configuration, please visit :ref:` configuration_object`. For the usage and recommendation of values of these parameters, please visit Parameters and Configurations.

3.1.4. Solve

For both shared-memory and distributed versions of ChASE, they share an uniform interface for the solving step.

Assume that an instance of ChaseMpi has been constructed with pre-allocated buffers for Hermitian matrix and desired eigenpairs.

3.1.4.1. An isolated problem

When an isolated problem is to be solved, there would be three steps for solving:

  • update the pre-allocated buffer of Hermitian matrix with the matrix to be diagonalized: e.g., through I/O, generation and redistribution

  • set the parameter approx_ to be false: config.setApprox(false)

  • solve the problem as: chase::Solve(&solver).

3.1.4.2. A sequence of problems

When a sequence of eigenproblems are to be solved one by one, the steps for solving are:

  • for the 1st problem

    • update the pre-allocated buffer of Hermitian matrix with the matrix to be diagonalized: e.g., through I/O, generation and redistribution

    • set the parameter approx_ to be false: config.setApprox(false)

    • solve the problem as: chase::Solve(&solver).

  • for the rest of problems (2nd, 3rd…)

    • update the pre-allocated buffer of Hermitian matrix with the matrix to be diagonalized: e.g., through I/O, generation and redistribution

    • set the parameter approx_ to be true: config.setApprox(true)

    • solve the problems as: chase::Solve(&solver).

Note

  • When the parameter approx_ is set to be false, it means that the initial guess vectors are filled with random numbers respecting to normal distribution. ChASE generate internally these numbers in parallel.

  • The buffer to the initial guess vectors should be allocated externally by users.

  • For distributed-memory ChASE with GPUs, these random numbers are generated in parallel on GPUs.

3.1.5. Performance Decorator

A templated class PerformanceDecoratorChase<T> is also provided, which is able to record the performance of different numerical kernels in ChASE. This class is a derived class of the class Chase<T>. It is quite simple to use it, and we give an example to show how to decorate a constructed instance of ChASE solver solver as follows:

PerformanceDecoratorChase<T> performanceDecorator(&solver);

Then the solving step should go with the instance performanceDecorator, rather than solver itself:

chase::Solve(&performanceDecorator);

After the solving step, the recorded performance can be printed out as follows:

performanceDecorator.GetPerfData().print();

The output of this performance decorator is as follows :

| Size  | Iterations | Vecs   |  All       | Lanczos    | Filter     | QR         | RR         | Resid      |
|     1 |          5 |   7556 |      1.116 |   0.135028 |    0.87997 |  0.0164864 |  0.0494752 |  0.0310726 |

which represents respectively:

  • the number of MPI processors in the working communicator,

  • the iteration number for convergence,

  • total number of matrix-vector product operations,

  • the total time (s),

  • the time cost of Lanczos, Filter, QR, RR and Residuals, respectively.

3.1.6. Extract the results

If an instance of ChaseMpi is constructed with user provided buffers V and Lambda, they will be overwritten by the desired eigenvectors and eigenvalues, respectively.

To be more precise, the first nev columns of V and the first nev elements of Lambda would be the required eigenpairs.

The residuals of all computed eigenpairs can be obtained as follows:

Base<T>* resid = single.GetResid();

in which Base<T> represents a basic type of a scalar type, e.g., Base<double> is double and Base<std::complex<float>> is float.

3.1.7. I/O

ChASE itself doesn’t provide any parallel I/O functions to load a large matrix from a binary file. The reason is that for the majority of applications of ChASE, the Hermitian matrix is supposed to be already well distributed by applications, it makes no sense to provide our own version of parallel I/O. This is also the motivation for us to provide multiple versions of ChASE with the support of both Block Distribution and Block-Cyclic Distribution, to adapt all the possible requirements of applications.

However, for the users who want to test ChASE as a standalone eigensolver, a parallel I/O might be necessary. Hence, in this section, we provide a hint to load matrix in parallel from local binary file by using built-in functions of ChASE. This parallel I/O is not tuned for optimal performance, and we encourage the user to develop their own one based on some mature parallel I/O libraries, such as HDF5 and sionlib.

This is an example to load a matrix from local into block distribution data layout.

template <typename T>
void readMatrix(T* H, /*The pointer to store the local part of matrix on each MPI rank*/
                std::string path_in, /*The path to load binary file of matrix*/
                std::size_t size, /*size = N * N, in which N is the size of matrix to be loaded*/
                std::size_t xoff,
                std::size_t yoff,
                std::size_t xlen,
                std::size_t ylen)
{
  std::size_t N = std::sqrt(size);
  std::ostringstream problem(std::ostringstream::ate);
  problem << path_in;

  std::cout << problem.str() << std::endl;
  std::ifstream input(problem.str().c_str(), std::ios::binary);
  if (!input.is_open()) {
    throw new std::logic_error(std::string("error reading file: ") +
                               problem.str());
  }

  for (std::size_t y = 0; y < ylen; y++) {
    input.seekg(((xoff) + N * (yoff + y)) * sizeof(T));
    input.read(reinterpret_cast<char*>(H + xlen * y), xlen * sizeof(T));
  }
}

For the parameters xoff, yoff, xlen and ylen, they can be obtained by the member function get_off of ChaseMpiProperties class as follows.

std::size_t xoff;
std::size_t yoff;
std::size_t xlen;
std::size_t ylen;

props.GetOff(&xoff, &yoff, &xlen, &ylen);

This is an example to load a matrix from local into block-cyclic distribution data layout.

template <typename T>
void readMatrix(T* H, /*The pointer to store the local part of matrix on each MPI rank*/
                std::string path_in, /*The path to load binary file of matrix*/
                std::size_t size, /*size = N * N, in which N is the size of matrix to be loaded*/
                std::size_t m,
                std::size_t mblocks,
                std::size_t nblocks,
                std::size_t* r_offs,
                std::size_t* r_lens,
                std::size_t* r_offs_l,
                std::size_t* c_offs,
                std::size_t* c_lens,
                std::size_t* c_offs_l){

  std::size_t N = std::sqrt(size);
  std::ostringstream problem(std::ostringstream::ate);
  problem << path_in;

  std::cout << problem.str() << std::endl;

  std::ifstream input(problem.str().c_str(), std::ios::binary);
  if (!input.is_open()) {
    throw new std::logic_error(std::string("error reading file: ") +
                               problem.str());
  }

  for(std::size_t j = 0; j < nblocks; j++){
    for(std::size_t i = 0; i < mblocks; i++){
      for(std::size_t q = 0; q < c_lens[j]; q++){
          input.seekg(((q + c_offs[j]) * N + r_offs[i])* sizeof(T));
          input.read(reinterpret_cast<char*>(H + (q + c_offs_l[j]) * m + r_offs_l[i]), r_lens[i] * sizeof(T));
      }
    }
  }
}

For the parameters m, mblocks, nblocks, r_offs, r_lens, r_offs_l, c_offs, c_lens and c_offs_l, they can be obtained by the member functions get_mblocks, get_nblocks, get_m, get_n, and get_offs_lens of ChaseMpiProperties class as follows.

/*local block number = mblocks x nblocks*/
std::size_t mblocks = props.get_mblocks();
std::size_t nblocks = props.get_nblocks();

/*local matrix size = m x n*/
std::size_t m = props.get_m();
std::size_t n = props.get_n();

/*global and local offset/length of each block of block-cyclic data*/
std::size_t *r_offs, *c_offs, *r_lens, *c_lens, *r_offs_l, *c_offs_l;

props.get_offs_lens(r_offs, r_lens, r_offs_l, c_offs, c_lens, c_offs_l);

3.2. Use ChASE from external applications

In order to embed the ChASE library in an application software, ChASE can be opportunely linked following the instructions in this section.

In this section, we give the guidelines for the integration of the ChASE library into a given application software.

3.2.1. Compiling with CMake

The CMakeLists.txt (see code window below) is an example on how to link ChASE installation using CMake. In this example ChASE is linked to a source file named chase_app.cpp. The CMakeLists.txt should then be included in the main directory of the application software as well as the chase_app.cpp file.

cmake_minimum_required(VERSION 3.8)

project(chase-app VERSION 0.0.1 LANGUAGES CXX)

#find installation of ChASE
find_package( chase REQUIRED CONFIG)

add_executable(${PROJECT_NAME})

# add the source file of application
target_sources(${PROJECT_NAME} PRIVATE chase_app.cpp)

# link to ChASE
target_link_libraries(${PROJECT_NAME} PUBLIC ChASE::chase_mpi)

# if users want to compile the application with multi-GPU version of ChASE
# the target should be linked to the both the library ChASE::chase_mpi
# and the library ChASE::chase_cuda
target_link_libraries(${PROJECT_NAME} PUBLIC ChASE::chase_cuda)

With CMake, the application software can be compiled by the following commands:

mkdir build & cd build
cmake .. -DCMAKE_PREFIX_PATH=${ChASEROOT}
make

example: 3_installation provides an example which illustrates the way to link ChASE by CMake with or without GPU supports.

Note

We highly recommend to link ChASE with CMake. The installation of ChASE allows to use CMake to find and link it easily.

3.2.2. Compiling with Makefile

Similar as the direct linking, it is also possible to link ChASE by Makefile. Here below is a template of this Makefile for example: 3_installation.

ChASEROOT = /The/installation/path/of/ChASE/on/your/platform

CXX = mpicxx #or other mpi CXX compiler

CXXFLAGS = \
    -Wall -fopenmp -MMD \

INCLUDE_DIR = ${ChASEROOT}/include #include the headers of ChASE

LIBS_BLASLAPACK = /your/BLAS/LAPACK/SCALAPACK/LIBRARIES

## Optional for multi-GPU version of ChASE ##
LIBS_CUDA = -lcublas -lcusolver -lcudart -lcurand ## link to the libraries of cuBLAS, cuSOLVER and CUDA runtime

## Optional for multi-GPU version of ChASE ##
LIBS_CHASE_CUDA = ${ChASEROOT}/lib64/libchase_cuda.a

chase-app: LIBS = ${LIBS_BLASLAPACK} #executable generated by chase-app.cpp

chase-app-gpu: LIBS = ${LIBS_BLASLAPACK} -L${LIBS_CHASE_CUDA} ${LIBS_CUDA} #executable generated by chase-app-gpu.cpp

src = ${wildcard *.cpp}
exe = ${basename ${src}}

all: $(exe)

.SUFFIXES:

%: %.cpp
        ${CXX} ${CXXFLAGS} ${LIBS} -I${INCLUDE_DIR} -o $@ $<

clean:
        -rm -f $(exe) *.o

-include *.d

3.3. Interface to C/Fortran

3.3.1. General Description

ChASE provides the interfaces to both C and Fortran.

The usage of both C and Fortran interfaces are split into 3 steps:

  • Initialization: initialization of the context for ChASE, including the setup of the MPI 2D grid, communicators and allocation of buffers, etc.

  • Solving: solving the given problem by ChASE within previously setup ChASE context

  • Finalization: Cleanup the ChASE context

Note

When a sequence of eigenprblems are to be solved, multiple solving steps can be called in sequence after the Initialization step. It is the users’ responsibility to form a new eigenproblem by updating the buffer allocated for the Hermitian/Symmetric Matrix.

Both C and Fortran interfaces of ChASE provides 3 versions of utilization:

  • Sequential ChASE: using the implementation of ChASE for shared-memory architectures.

  • Distributed-memory ChASE with Block-Block distribution: using the implementation of ChASE for distributed-memory architectures, with Block-Block data layout.

  • Distributed-memory ChASE with Block-Cyclic distribution: using the implementation of ChASE for distributed-memory architectures, with Block-Cyclic data layout.

Warning

When CUDA is detected, these interfaces would automatically use GPU(s).

Note

The naming logic of the interface functions are as follows:

  • For the names of all the functions for distributed memory ChASE, they starts with a prefix p, which follows a same way of naming in ScaLAPACK.

  • For the Block and Block-Cyclic data layouts:

    • they share a same interface for Solving and Finalization steps

    • but a different interface for the Initialization step. For Block-Cyclic data layout, the related Initialization function ends with a suffix blockcyclic

  • The Fortran interfaces are implemented based on iso_c_binding. It is standard intrinsic module which defines named constants, types, and procedures for the inter-operation with C functions. C and Fortran functions share the same names. Additionally, unlike the Fortran routines, C functions has a suffix _.

Different scalar types are also supported by the interfaces of ChASE. We will use abbreviations <x> for the corresponding short type to make a more concise and clear presentation of the implemented functions. Base<x> is defined as the table below. Unless otherwise specified <x> has the following meanings:

<x>

Type in C and Fortran

Meaning

Base<x> in C and Fortran

s

float and c_float

real single-precision

float and c_float

d

double and c_double

real double-precision

double and c_double

c

float _Complex and c_float_complex

complex single-precision

float and c_float

z

double _Complex and c_double_complex

complex double precision

double and c_double

3.3.2. Initialization Functions

3.3.2.1. <x>chase_init

<x>chase_init initialize the context for the shared-memory ChASE. ChASE is initialized with the buffers h, v, ritzv, which should be allocated externally by users. These buffers will be re-used when a sequence of eigenproblems are to be solved.

The APIs for the C interfaces are as follows:

void schase_init_(int* n, int* nev, int* nex, float* h, float* v, float* ritzv, int* init)
void dchase_init_(int* n, int* nev, int* nex, double* h, double* v, double* ritzv, int* init)
void cchase_init_(int* n, int* nev, int* nex, float _Complex* h, float _Complex* v, float* ritzv, int* init)
void zchase_init_(int* n, int* nev, int* nex, double _Complex* h, double _Complex* v, double* ritzv, int* init)

The APIs for the Fortran interfaces are as follows:

SUBROUTINE schase_init(n, nev, nex, h, v, ritzv, init)
SUBROUTINE dchase_init(n, nev, nex, h, v, ritzv, init)
SUBROUTINE cchase_init(n, nev, nex, h, v, ritzv, init)
SUBROUTINE zchase_init(n, nev, nex, h, v, ritzv, init)

The interfaces of C and Fortran share the same parameters as follows:

Param.

In/Out

Meaning

n

In

global matrix size of the matrix to be diagonalized

nev

In

number of desired eigenpairs

nex

In

extra searching space size

h

In

pointer to the matrix to be diagonalized, with size of matrix nxn

v

In, Out

(nx(nev+nex)) matrix, input is the initial guess eigenvectors, and for output, the first nev columns are overwritten by the desired eigenvectors

ritzv

Out

an array of size nev which contains the desired eigenvalues, it is of type Base<x>

init

Out

a flag to indicate if ChASE has been initialized, if initialized, return 1

3.3.2.2. p<x>chase_init

p<x>chase_init initialize the context for the distributed-memory ChASE with Block Distribution. ChASE is initialized with the buffers h, v, ritzv, which should be allocated externally by users. These buffers will be re-used when a sequence of eigenproblems are to be solved.

The APIs for the C interfaces are as follows:

void pschase_init_(int *nn, int *nev, int *nex, int *m, int *n, float *h, int *ldh,
                   float *v, float *ritzv, int *dim0, int *dim1, char *grid_major,
                   MPI_Comm *comm, int *init)
void pdchase_init_(int *nn, int *nev, int *nex, int *m, int *n, double *h, int *ldh,
                   double *v, double *ritzv, int *dim0, int *dim1, char *grid_major,
                   MPI_Comm *comm, int *init)
void pcchase_init_(int *nn, int *nev, int *nex, int *m, int *n, float _Complex *h, int *ldh,
                   float _Complex *v, float *ritzv, int *dim0, int *dim1, char *grid_major,
                   MPI_Comm *comm, int *init)
void pzchase_init_(int *nn, int *nev, int *nex, int *m, int *n, double _Complex *h, int *ldh,
                   double _Complex *v, double *ritzv, int *dim0, int *dim1, char *grid_major,
                   MPI_Comm *comm, int *init)

The APIs for the Fortran interfaces are as follows:

subroutine  pschase_init (nn, nev, nex, m, n, h, ldh, v, ritzv, dim0, dim1, grid_major, fcomm, init)
subroutine  pdchase_init (nn, nev, nex, m, n, h, ldh, v, ritzv, dim0, dim1, grid_major, fcomm, init)
subroutine  pcchase_init (nn, nev, nex, m, n, h, ldh, v, ritzv, dim0, dim1, grid_major, fcomm, init)
subroutine  pzchase_init (nn, nev, nex, m, n, h, ldh, v, ritzv, dim0, dim1, grid_major, fcomm, init)

The interfaces of C and Fortran share the same parameters as follows:

Param.

In/Out

Meaning

nn

In

global matrix size of the matrix to be diagonalized

nev

In

number of desired eigenpairs

nex

In

extra searching space size

m

In

max row number of local matrix h on each MPI process

n

In

max column number of local matrix h on each MPI process

h

In

pointer to the matrix to be diagonalized. h is a block-block distribution of global matrix. h is of size mxn with its leading dimension is ldh

ldh

In

leading dimension of h on each MPI process

v

In, Out

(mx(nev+nex)) matrix, input is the initial guess eigenvectors, and for output, the first nev columns are overwritten by the desired eigenvectors. v is only partially distributed within column communicator. It is reduandant among different column communicator.

ritzv

Out

an array of size nev which contains the desired eigenvalues, it is of type Base<x>

dim0

In

row number of 2D MPI grid

dim1

In

column number of 2D MPI grid

grid_major

In

major of 2D MPI grid. Row major: grid_major=’R’, column major: grid_major=’C’

comm or fcomm

In

the working MPI communicator. comm is for MPI-C communicator, and fcomm is for MPI-Fortran communicator.

init

Out

a flag to indicate if ChASE has been initialized, if initialized, return 1

3.3.2.3. p<x>chase_init_blockcyclic

p<x>chase_init_blockcyclic initialize the context for the distributed-memory version of ChASE with Block-Cyclic Distribution. ChASE is initialized with the buffers h, v, ritzv, which should be allocated externally by users. These buffers will be re-used when a sequence of eigenproblems are to be solved.

The APIs for the C interfaces are as follows:

void pschase_init_blockcyclic_(int *nn, int *nev, int *nex, int *mbsize, int *nbsize,
                               float *h, int *ldh, float *v, float *ritzv,
                               int *dim0, int *dim1, char *grid_major, int *irsrc,
                               int *icsrc, MPI_Comm *comm, int *init)
void pdchase_init_blockcyclic_(int *nn, int *nev, int *nex, int *mbsize, int *nbsize,
                               double *h, int *ldh, double *v, double *ritzv,
                               int *dim0, int *dim1, char *grid_major, int *irsrc,
                               int *icsrc, MPI_Comm *comm, int *init)
void pcchase_init_blockcyclic_(int *nn, int *nev, int *nex, int *mbsize, int *nbsize,
                               float _Complex *h, int *ldh, float _Complex *v, float *ritzv,
                               int *dim0, int *dim1, char *grid_major, int *irsrc,
                               int *icsrc, MPI_Comm *comm, int *init)
void pzchase_init_blockcyclic_(int *nn, int *nev, int *nex, int *mbsize, int *nbsize,
                               double _Complex *h, int *ldh, double _Complex *v, double *ritzv,
                               int *dim0, int *dim1, char *grid_major, int *irsrc,
                               int *icsrc, MPI_Comm *comm, int *init)

The APIs for the Fortran interfaces are as follows:

subroutine  pschase_init_blockcyclic (nn, nev, nex, mbsize, nbsize, h, ldh, v, ritzv, dim0, dim1, grid_major, irsrc, icsrc, fcomm, init)
subroutine  pdchase_init_blockcyclic (nn, nev, nex, mbsize, nbsize, h, ldh, v, ritzv, dim0, dim1, grid_major, irsrc, icsrc, fcomm, init)
subroutine  pcchase_init_blockcyclic (nn, nev, nex, mbsize, nbsize, h, ldh, v, ritzv, dim0, dim1, grid_major, irsrc, icsrc, fcomm, init)
subroutine  pzchase_init_blockcyclic (nn, nev, nex, mbsize, nbsize, h, ldh, v, ritzv, dim0, dim1, grid_major, irsrc, icsrc, fcomm, init)

The interfaces of C and Fortran share the same parameters as follows:

Param.

In/Out

Meaning

nn

In

global matrix size of the matrix to be diagonalized

nev

In

number of desired eigenpairs

nex

In

extra searching space size

mbsize

In

block size for the block-cyclic distribution for the rows of global matrix

nbsize

In

block size for the block-cyclic distribution for the cloumns of global matrix

h

In

pointer to the matrix to be diagonalized. h is a block-block distribution of global matrix. h is of size mxn with its leading dimension is ldh

ldh

In

leading dimension of h on each MPI process

v

In, Out

(mx(nev+nex)) matrix, input is the initial guess eigenvectors, and for output, the first nev columns are overwritten by the desired eigenvectors. v is only partially distributed within column communicator. It is redundant among different column communicator.

ritzv

Out

an array of size nev which contains the desired eigenvalues, it is of type Base<x>

dim0

In

row number of 2D MPI grid

dim1

In

column number of 2D MPI grid

irsrc

In

process row over which the first row of the global matrix h is distributed

icsrc

In

process column over which the first column of the global matrix h is distributed.

grid_major

In

major of 2D MPI grid. Row major: grid_major=’R’, column major: grid_major=’C’

comm or fcomm

In

the working MPI communicator. comm is for MPI-C communicator, and fcomm is for MPI-Fortran communicator.

init

Out

a flag to indicate if ChASE has been initialized, if initialized, return 1

3.3.3. Solving Functions

3.3.3.1. <x>chase

<x>chase solves an eigenvalue problem with given configuration of parameters on shared-memory architectures. When CUDA is enabled, it will automatically use 1 GPU card.

void schase_(int *deg, double *tol, char *mode, char *opt)
void dchase_(int *deg, double *tol, char *mode, char *opt)
void cchase_(int *deg, double *tol, char *mode, char *opt)
void zchase_(int *deg, double *tol, char *mode, char *opt)
subroutine  schase (deg, tol, mode, opt)
subroutine  dchase (deg, tol, mode, opt)
subroutine  cchase (deg, tol, mode, opt)
subroutine  zchase (deg, tol, mode, opt)

Param.

In/Out

Meaning

deg

In

initial degree of Chebyshev polynomial filter

tol

In

desired absolute tolerance of computed eigenpairs

mode

In

for sequences of eigenproblems, if reusing the eigenpairs obtained from last system. If mode = ‘A’, reuse, otherwise, no.

opt

In

determining if using internal optimization of Chebyshev polynomial degree. If opt=’S’, use, otherwise, no.

3.3.3.2. p<x>chase

p<x>chase solves an eigenvalue problem with given configuration of parameters on distributed-memory architectures. When CUDA is enabled, it will automatically use multi-GPUs with the configuration 1GPU per MPI rank.

The APIs for the C interfaces are as follows:

void pschase_(int *deg, double *tol, char *mode, char *opt)
void pdchase_(int *deg, double *tol, char *mode, char *opt)
void pcchase_(int *deg, double *tol, char *mode, char *opt)
void pzchase_(int *deg, double *tol, char *mode, char *opt)

The APIs for the Fortran interfaces are as follows:

subroutine  pschase (deg, tol, mode, opt)
subroutine  pdchase (deg, tol, mode, opt)
subroutine  pcchase (deg, tol, mode, opt)
subroutine  pzchase (deg, tol, mode, opt)

The interfaces of C and Fortran share the same parameters as follows:

Param.

In/Out

Meaning

deg

In

initial degree of Cheyshev polynomial filter

tol

In

desired absolute tolerance of computed eigenpairs

mode

In

for sequences of eigenproblems, if reusing the eigenpairs obtained from last system. If mode = ‘A’, reuse, otherwise, no.

opt

In

determining if using internal optimization of Chebyshev polynomial degree. If opt=’S’, use, otherwise, no.

3.3.4. Finalization Functions

3.3.4.1. <x>chase_finalize

<x>chase_finalize cleans up the instances of shared-memory ChASE.

The APIs for the C interfaces are as follows:

void schase_finalize_(int *flag)
void dchase_finalize_(int *flag)
void cchase_finalize_(int *flag)
void zchase_finalize_(int *flag)

The APIs for the Fortran interfaces are as follows:

subroutine  schase_finalize (flag)
subroutine  dchase_finalize (flag)
subroutine  cchase_finalize (flag)
subroutine  zchase_finalize (flag)

The interfaces of C and Fortran share the same parameters as follows:

Param.

In/Out

Meaning

flag

Out

A flag to indicate if ChASE has been cleared up. If ChASE has been cleaned up, flag=0

3.3.4.2. p<x>chase_finalize

p<x>chase_finalize cleans up the instances of distributed-memory ChASE.

Note

For Block Distribution and Block-Cyclic Distribution versions of ChASE, they share an uniform interface for the finalization.

The APIs for the C interfaces are as follows:

void pschase_finalize_(int *flag)
void pdchase_finalize_(int *flag)
void pcchase_finalize_(int *flag)
void pzchase_finalize_(int *flag)

The APIs for the Fortran interfaces are as follows:

subroutine  pschase_finalize (flag)
subroutine  pdchase_finalize (flag)
subroutine  pcchase_finalize (flag)
subroutine  pzchase_finalize (flag)

The interfaces of C and Fortran share the same parameters as follows:

Param.

In/Out

Meaning

flag

Out

A flag to indicate if ChASE has been cleared up. If ChASE has been cleaned up, flag=0

3.3.5. Examples

A Snippet of examples for both C and Fortran interfaces are shown as follows. We provide completed examples for both C and Fortran interfaces with both shared-memory and distributed-memory architectures in ./examples/4_interface.

3.3.5.1. Example of C interface

...
...

void pzchase_init_(int* N, int* nev, int* nex, int* m, int* n,
                   double _Complex* H, int* ldh, double _Complex* V,
                   double* ritzv, int* dim0, int* dim1, char* grid_major,
                   MPI_Comm* comm, int* init);
void pzchase_finalize_(int* flag);
void pzchase_(int* deg, double* tol, char* mode, char* opt);

int main(int argc, char** argv)
{
    MPI_Init(&argc, &argv);
    int rank = 0, init;

    int N = 1001; //global size of matrix
    int nev = 100; //number of eigenparis to compute
    int nex = 40; //size of external searching space
    int m = 501; //number of rows of local matrix on each MPI rank
    int n = 501; //number of columns of local matrix on each MPI rank
    MPI_Comm comm = MPI_COMM_WORLD; //working MPI communicator
    int dims[2];
    dims[0] = 2; //row number of 2D MPI grid
    dims[1] = 2; //column number of 2D MPI grid
    //allocate buffer to store computed eigenvectors

    double _Complex* V = (double _Complex*)malloc(sizeof(double _Complex) * m * (nev + nex));
    //allocate buffer to store computed eigenvalues
    double* Lambda = (double*)malloc(sizeof(double) * (nev + nex));
    //allocate buffer to store local block of Hermitian matrix on each MPI rank
    double _Complex* H = (double _Complex*)malloc(sizeof(double _Complex) * m * n);

    // config
    int deg = 20;
    double tol = 1e-10;
    char mode = 'R';
    char opt = 'S';

    //Initialize of ChASE
    pzchase_init_(&N, &nev, &nex, &m, &n, H, &m, V, Lambda, &dims[0], &dims[1],
                  (char*)"C", &comm, &init);

    /*
        Generating or loading matrix into H
    */

    //solve 1st eigenproblem with defined configuration of parameters
    pzchase_(&deg, &tol, &mode, &opt);

    /*
        form a new eigenproblem by updating the buffer H
    */

    //Set the mode to 'A', which can recycle previous eigenvectors
    mode = 'A';

    //solve 2nd eigenproblem with updated parameters
    pzchase_(&deg, &tol, &mode, &opt);


    //finalize and clean up
    pzchase_finalize_(&init);

    MPI_Finalize();
}

3.3.5.2. Example of Fortran interface

PROGRAM main
use mpi
use chase_diag !use chase fortran interface module

integer ierr, init, comm
integer m, n
integer dims(2)
integer nn, nev, nex
real(8) :: tol
integer :: deg
character        :: mode, opt, major
complex(8),  allocatable :: h(:,:), v(:,:)
real(8), allocatable :: lambda(:)

call mpi_init(ierr)

nn = 1001 ! global size of matrix
nev = 100 ! number of eigenparis to compute
nex = 40 ! size of external searching space

comm = MPI_COMM_WORLD ! working MPI communicator
! config
deg = 20
tol = 1e-10
mode = 'R'
opt = 'S'
major = 'C'

dims(1) = 2 ! row number of 2D MPI grid
dims(2) = 2 ! column number of 2D MPI grid

m = 501 ! number of rows of local matrix on each MPI rank
n = 501 ! number of columns of local matrix on each MPI rank

allocate(h(m, n)) ! allocate buffer to store local block of Hermitian matrix on each MPI rank
allocate(v(m, nev + nex)) ! allocate buffer to store computed eigenvectors
allocate(lambda(nev + nex)) ! allocate buffer to store computed eigenvalues

! Initialize of ChASE
call pzchase_init(nn, nev, nex, m, n, h, m, v, lambda, dims(1), dims(2), major, comm, init)

!
!      Generating or loading matrix into H
!

! solve 1st eigenproblem with defined configuration of parameters
call pzchase(deg, tol, mode, opt)

!
!      form a new eigenproblem by updating the buffer H
!
! Set the mode to 'A', which can recycle previous eigenvectors
mode = 'A'

! solve 2nd eigenproblem with updated parameters
call pzchase(deg, tol, mode, opt)

! finalize and clean up
call pzchase_finalize(init)

call mpi_finalize(ierr)


END PROGRAM