3.1.2.3. Single GPU¶

template<class T> class chase::mpi::ChaseMpiDLACudaSeq : public chase::mpi::ChaseMpiDLAInterface<T>¶

A derived class of ChaseMpiDLAInterface which implements ChASE targeting shared-memory architectures, some selected computation tasks are offloaded to one single GPU card.

Public Functions

void preApplication(T *V, std::size_t locked, std::size_t block) override¶

This function is for some pre-application steps for the distributed HEMM in ChASE. These steps may vary in different implementations targetting different architectures. These steps can be backup of some buffers, copy data from CPU to GPU, etc.

Parameters

V1: a pointer to a matrix
locked: an integer indicating the number of locked (converged) eigenvectors
block: an integer indicating the number of non-locked (non-converged) eigenvectors

void apply(T alpha, T beta, std::size_t offset, std::size_t block, std::size_t locked) override¶

Performs \(V_2<- \alpha V1H + \beta V_2\) and swap \((V_1,V_2)\).

The first offset vectors of V1 and V2 are not part of the HEMM. The number of vectors performed in V1 and V2 is block In MATLAB notation, this operation performs:

V2[:,start:end]<-alpha*V1[:,start:end]*H+beta*V2[:,start:end],

in which start=locked+offset and end=locked+offset+block.

Parameters

alpha: a scalar times on V1*H in HEMM operation.
beta: a scalar times on V2 in HEMM operation.
offset: an offset of number vectors which the HEMM starting from.
block: number of non-converged eigenvectors, it indicates the number of vectors in V1 and V2 to perform HEMM.
locked: number of converged eigenvectors.

bool postApplication(T *V, std::size_t block, std::size_t locked) override¶

Copy from buffer rectangular matrix v1 to v2. For the implementation of distributed-memory ChASE, this operation performs a copy from a matrix distributed within each column communicator and redundant among different column communicators to a matrix redundantly distributed across all MPI procs. Then in the next iteration of ChASE-MPI, this operation takes places in the row communicator…

Parameters

V: the target buff
block: number of columns to copy from v1 to v2
locked: number of converged eigenvectors.

void shiftMatrix(T c, bool isunshift = false) override¶

This function shifts the diagonal of global matrix with a constant value c.

Parameters

c: shift value

void applyVec(T *B, T *C) override¶

Performs a Generalized Matrix Vector Multiplication (GEMV) with alpha=1.0 and beta=0.0.

The operation is C=H*B.

Parameters

B: the vector to be multiplied on H.
C: the vector to store the product of H and B.

void axpy(std::size_t N, T *alpha, T *x, std::size_t incx, T *y, std::size_t incy) override¶

A BLAS-like function which performs a constant times a vector plus a vector.

Parameters

[in] N: number of elements in input vector(s).
[in] alpha: a scalar times on x in AXPY operation.
[in] x: an array of type T, dimension ( 1 + ( N - 1 )*abs( incx ).
[in] incx: storage spacing between elements of x.
[in/out]: y: an array of type T, dimension ( 1 + ( N - 1 )*abs( incy ).
[in] incy: storage spacing between elements of y.

void scal(std::size_t N, T *a, T *x, std::size_t incx) override¶

A BLAS-like function which scales a vector by a constant.

Parameters

[in] N: number of elements in input vector(s).
[in] a: a scalar of type T times on vector x.
[in/out]: x: an array of type T, dimension ( 1 + ( N - 1 )*abs( incx ).
[in] incx: storage spacing between elements of x.

Base<T> nrm2(std::size_t n, T *x, std::size_t incx) override¶

A BLAS-like function which returns the euclidean norm of a vector.

Return

the euclidean norm of vector x.

Parameters

[in] N: number of elements in input vector(s).
[in] x: an array of type T, dimension ( 1 + ( N - 1 )*abs( incx ).
[in] incx: storage spacing between elements of x.

T dot(std::size_t n, T *x, std::size_t incx, T *y, std::size_t incy) override¶

A BLAS-like function which forms the dot product of two vectors.

Return

the dot product of vectors x and y.

Parameters

[in] N: number of elements in input vector(s).
[in] x: an array of type T, dimension ( 1 + ( N - 1 )*abs( incx ).
[in] incx: storage spacing between elements of x.
[in] y: an array of type T, dimension ( 1 + ( N - 1 )*abs( incy ).
[in] incy: storage spacing between elements of y.