3.1.3.3. multi-GPUs in node¶
-
template<class
T
>
classchase::mpi
::
ChaseMpiDLAMultiGPU
: public chase::mpi::ChaseMpiDLAInterface<T>¶ A derived class of ChaseMpiDLAInterface which implements the inter-node computation for a multi-GPUs MPI-based implementation of ChASE.
Public Functions
-
void
preApplication
(T *V, std::size_t locked, std::size_t block) override¶ This function set initially the operation for apply() in filter
it copies also
C_
to device bufferd_C
-
void
apply
(T alpha, T beta, std::size_t offset, std::size_t block, std::size_t locked) override¶ This function performs the local computation of
GEMM
for ChaseMpiDLA::apply()It is implemented based on
cuBLAS
’scublasXgemm
.
-
bool
postApplication
(T *V, std::size_t block, std::size_t locked) override¶ This function copies a number of column of
d_C_
toC_
.This memory copying is required for ChaseMpi::Lanczos in which
axpy
, etcare performed on CPUs.
-
void
shiftMatrix
(T c, bool isunshift = false) override¶ This function performs the shift of diagonal of a global matrix
This global is already distributed on GPUs, so the shifting operation takes place on the local block of global matrix on each GPU.
This function is naturally in parallel among all MPI procs and also with each GPU.
-
void
applyVec
(T *B, T *C) override¶ All required operations for this function has been done in for ChaseMpiDLA::applyVec().
This function contains nothing in this class.
-
void