3.1.3.3. multi-GPUs in node

template<class T>
class chase::mpi::ChaseMpiDLAMultiGPU : public chase::mpi::ChaseMpiDLAInterface<T>

A derived class of ChaseMpiDLAInterface which implements the inter-node computation for a multi-GPUs MPI-based implementation of ChASE.

Public Functions

void preApplication(T *V, std::size_t locked, std::size_t block) override

  • This function set initially the operation for apply() in filter

  • it copies also C_ to device buffer d_C

void apply(T alpha, T beta, std::size_t offset, std::size_t block, std::size_t locked) override

  • This function performs the local computation of GEMM for ChaseMpiDLA::apply()

  • It is implemented based on cuBLAS’s cublasXgemm.

bool postApplication(T *V, std::size_t block, std::size_t locked) override

  • This function copies a number of column of d_C_ to C_.

  • This memory copying is required for ChaseMpi::Lanczos in which axpy, etc

  • are performed on CPUs.

void shiftMatrix(T c, bool isunshift = false) override

This function performs the shift of diagonal of a global matrix

  • This global is already distributed on GPUs, so the shifting operation takes place on the local block of global matrix on each GPU.

  • This function is naturally in parallel among all MPI procs and also with each GPU.

void applyVec(T *B, T *C) override

  • All required operations for this function has been done in for ChaseMpiDLA::applyVec().

  • This function contains nothing in this class.

void axpy(std::size_t N, T *alpha, T *x, std::size_t incx, T *y, std::size_t incy) override

It is an interface to BLAS ?axpy.

void scal(std::size_t N, T *a, T *x, std::size_t incx) override

It is an interface to BLAS ?scal.

Base<T> nrm2(std::size_t n, T *x, std::size_t incx) override

It is an interface to BLAS ?nrm2.

T dot(std::size_t n, T *x, std::size_t incx, T *y, std::size_t incy) override

It is an interface to BLAS ?dot.