1.5.1. Linear Algebra Kernels

1.5.1.1. chase::linalg::internal::cpu

namespace cpu

Functions

template<typename T> int cholQR1(std::size_t m, std::size_t n, T *V, int ldv, T *A = nullptr)

Performs Cholesky QR factorization (degree 1).

This function performs Cholesky QR factorization on the matrix V. It computes ( A = V^T V ) and then solves ( A X = V ).

Parameters:

m – The number of rows of matrix V.
n – The number of columns of matrix V.
V – The matrix on which the factorization is performed.
ldv – The leading dimension of V.
A – The output matrix that stores the result of the Cholesky factorization (optional, will be allocated if null).

Returns:

0 if successful, non-zero value otherwise.

template<typename T> int cholQR2(std::size_t m, std::size_t n, T *V, int ldv, T *A = nullptr)

Performs Cholesky QR factorization (degree 2).

This function performs Cholesky QR factorization on the matrix V. It applies two iterations of Cholesky QR factorization.

Parameters:

m – The number of rows of matrix V.
n – The number of columns of matrix V.
V – The matrix on which the factorization is performed.
ldv – The leading dimension of V.
A – The output matrix that stores the result of the Cholesky factorization (optional, will be allocated if null).

Returns:

0 if successful, non-zero value otherwise.

template<typename T> int shiftedcholQR2(std::size_t m, std::size_t n, T *V, int ldv, T *A = nullptr)

Performs Cholesky QR factorization with shifting (degree 2).

This function performs Cholesky QR factorization on the matrix V, with a shift applied to the matrix diagonal. It applies two iterations of Cholesky QR factorization with a diagonal shift.

Parameters:

m – The number of rows of matrix V.
n – The number of columns of matrix V.
V – The matrix on which the factorization is performed.
ldv – The leading dimension of V.
A – The output matrix that stores the result of the Cholesky factorization (optional, will be allocated if null).

Returns:

0 if successful, non-zero value otherwise.

template<typename T> void houseHoulderQR(std::size_t m, std::size_t n, T *V, std::size_t ldv)

Performs Householder QR factorization.

This function computes the QR factorization of matrix V using the Householder transformation.

Parameters:

m – The number of rows of matrix V.
n – The number of columns of matrix V.
V – The matrix on which the factorization is performed.
ldv – The leading dimension of V.

template<typename T> chase::Base<T> computeConditionNumber(std::size_t m, std::size_t n, T *V, std::size_t ldv)

template<typename T> void lanczos(std::size_t M, std::size_t numvec, chase::matrix::Matrix<T> *H, T *V, std::size_t ldv, Base<T> *upperb, Base<T> *ritzv, Base<T> *Tau, Base<T> *ritzV)

Lanczos algorithm for eigenvalue computation.

This function performs the Lanczos algorithm, which is used to estimate the upper bound of spectra of symmetric/Hermitian matrix. The algorithm is iteratively applied to the matrix H, where the input matrix H is a square matrix of size N x N. The Lanczos algorithm builds an orthonormal basis of the Krylov subspace, and the resulting tridiagonal matrix is diagonalized using the t_stemr function.

Template Parameters:

T – The data type for the matrix elements (e.g., float, double).

Parameters:

M – The number of Lanczos iterations.
numvec – The number of runs of Lanczos.
H – The input matrix for the Lanczos algorithm (of size N x N).
V – The input matrix used for storing vectors (of size N x numvec).
ldv – The leading dimension of V (number of rows).
upperb – A pointer to the upper bound of the eigenvalue spectrum.
ritzv – A pointer to store the Ritz eigenvalues.
Tau – A pointer to store the computed Tau values.
ritzV – A pointer to store the Ritz eigenvectors.

template<typename T> void lanczos(std::size_t M, chase::matrix::Matrix<T> *H, T *V, std::size_t ldv, Base<T> *upperb)

Lanczos algorithm for eigenvalue computation (simplified version).

This version of the Lanczos algorithm is a simplified version that computes only the upper bound of the eigenvalue spectrum and does not compute eigenvectors. It operates similarly to the full Lanczos algorithm but omits the eigenvector computation step.

Template Parameters:

T – The data type for the matrix elements (e.g., float, double).

Parameters:

M – The number of Lanczos iterations.
H – The input matrix for the Lanczos algorithm (of size N x N).
V – The input matrix used for storing vectors (of size N x 1).
ldv – The leading dimension of V (number of rows).
upperb – A pointer to the upper bound of the eigenvalue spectrum.

template<typename T> void lanczos(std::size_t M, std::size_t numvec, chase::matrix::QuasiHermitianMatrix<T> *H, T *V, std::size_t ldv, Base<T> *upperb, Base<T> *ritzv, Base<T> *Tau, Base<T> *ritzV)

Lanczos algorithm for eigenvalue computation of Quasi-Hermitian matrices.

This function performs the Lanczos algorithm, which is used to estimate the upper bound of spectra of the quasi-Hermitian matrix H, i.e., where H is pseudo-Hermitian and SH is Hermitian Positive Definite. The algorithm is iteratively applied to the matrix H, where the input matrix H is a square matrix of size N x N. The Lanczos algorithm builds an orthonormal basis of the Krylov subspace, and the resulting tridiagonal matrix is diagonalized using the t_stemr function. This pseudo-code of this implementation can be found in

Template Parameters:

T – The data type for the matrix elements (e.g., float, double).

Parameters:

M – The number of Lanczos iterations.
numvec – The number of runs of Lanczos.
N – The size of the input matrix H.
H – The input matrix for the Lanczos algorithm (of size N x N).
ldh – The leading dimension of H (number of rows).
V – The input matrix used for storing vectors (of size N x numvec).
ldv – The leading dimension of V (number of rows).
upperb – A pointer to the upper bound of the eigenvalue spectrum.
ritzv – A pointer to store the Ritz eigenvalues.
Tau – A pointer to store the computed Tau values.
ritzV – A pointer to store the Ritz eigenvectors.

template<typename T> void lanczos(std::size_t M, chase::matrix::QuasiHermitianMatrix<T> *H, T *V, std::size_t ldv, Base<T> *upperb)

Lanczos algorithm for eigenvalue computation of quasi-hermitian matrices (simplified version).

This version of the Lanczos algorithm is a simplified version that computes only the upper bound of the eigenvalue spectrum of quasi-hermitian matrices and does not computei eigenvectors. It operates similarly to the full Lanczos algorithm but omits the eigenvector computation step.

Template Parameters:

T – The data type for the matrix elements (e.g., float, double).

Parameters:

M – The number of Lanczos iterations.
H – The input quasi-hermitian matrix for the Lanczos algorithm (of size N x N).
V – The input matrix used for storing vectors (of size N x 1).
ldv – The leading dimension of V (number of rows).
upperb – A pointer to the upper bound of the eigenvalue spectrum.

template<typename T> void rayleighRitz(chase::matrix::Matrix<T> *H, std::size_t n, T *Q, std::size_t ldq, T *V, std::size_t ldv, Base<T> *ritzv, T *A = nullptr)

Perform the Rayleigh-Ritz procedure to compute eigenvalues and eigenvectors of a matrix.

The Rayleigh-Ritz method computes an approximation to the eigenvalues and eigenvectors of a matrix by projecting the matrix onto a subspace defined by a set of vectors (Q) and solving the eigenvalue problem for the reduced matrix. The computed Ritz values are stored in the ritzv array, and the resulting eigenvectors are stored in V.

The procedure performs the following steps:

Computes the matrix-vector multiplication: V = H * Q.
Computes A = V’ * Q, where V’ is the conjugate transpose of V.
Solves the eigenvalue problem for A using LAPACK’s heevd function, computing the Ritz values in ritzv.
Computes the final approximation to the eigenvectors by multiplying Q with the computed ritz vectors.

Template Parameters:

T – Data type for the matrix (e.g., float, double, etc.).

Parameters:

N – [in] The number of rows of the matrix H.
H – [in] The input matrix (N x N).
ldh – [in] The leading dimension of the matrix H.
n – [in] The number of vectors in Q (subspace size).
Q – [in] The input matrix of size (N x n), whose columns are the basis vectors for the subspace.
ldq – [in] The leading dimension of the matrix Q.
V – [out] The output matrix (N x n), which will store the result of the projection.
ldw – [in] The leading dimension of the matrix V.
ritzv – [out] The array of Ritz values, which contains the eigenvalue approximations.
A – [in] A temporary matrix used in intermediate calculations. If not provided, it is allocated internally.

template<typename T> void rayleighRitz(chase::matrix::QuasiHermitianMatrix<T> *H, std::size_t n, T *Q, std::size_t ldq, T *V, std::size_t ldv, Base<T> *ritzv, T *A = nullptr)

Perform the Rayleigh-Ritz procedure to compute eigenvalues and eigenvectors of a Pseudo-Hermitian matrix.

The Rayleigh-Ritz method computes an approximation to the eigenvalues and eigenvectors of a matrix by projecting the matrix onto a subspace defined by a set of vectors (Q) and solving the eigenvalue problem for the reduced matrix. The real parts of the computed Ritz values are stored in the ritzv array, and the resulting right eigenvectors are stored in V.

The procedure performs the following steps:

Computes the matrix-vector multiplication: V = H’ * Q = S * H * S * Q.
Computes A = V’ * Q, where V’ is the conjugate transpose of V.
Solves the eigenvalue problem for A using LAPACK’s geev function, computing the real part of Ritz values in ritzv.
Computes the final approximation to the eigenvectors by multiplying Q with the computed ritz vectors.

Template Parameters:

T – Data type for the matrix (e.g., float, double, etc.).

Parameters:

H – [in] The Quasi-Hermitian input matrix (N x N).
n – [in] The number of vectors in Q (subspace size).
Q – [in] The input matrix of size (N x n), whose columns are the basis vectors for the subspace.
ldq – [in] The leading dimension of the matrix Q.
V – [out] The output matrix (N x n), which will store the result of the projection.
ldv – [in] The leading dimension of the matrix V.
ritzv – [out] The array of Ritz values, which contains the eigenvalue approximations.
A – [in] A temporary matrix used in intermediate calculations. If not provided, it is allocated internally.

template<typename T> void rayleighRitz_v2(chase::matrix::QuasiHermitianMatrix<T> *H, std::size_t n, T *Q, std::size_t ldq, T *V, std::size_t ldv, Base<T> *ritzv, T *A = nullptr)

Perform the Rayleigh-Ritz procedure to compute eigenvalues and eigenvectors of a Quasi-Hermitian matrix.

The Rayleigh-Ritz method computes an approximation to the eigenvalues and eigenvectors of a matrix by projecting the matrix onto a subspace defined by a set of vectors (Q) and solving the eigenvalue problem for the reduced matrix. The real parts of the computed Ritz values are stored in the ritzv array, and the resulting right eigenvectors are stored in V.

The procedure performs the following steps:

Computes the matrix-vector multiplication: V = H’ * Q = S * H * S * Q.
Computes A = V’ * Q, where V’ is the conjugate transpose of V.
Solves the eigenvalue problem for A using LAPACK’s geev function, computing the real part of Ritz values in ritzv.
Computes the final approximation to the eigenvectors by multiplying Q with the computed ritz vectors.

Template Parameters:

T – Data type for the matrix (e.g., float, double, etc.).

Parameters:

H – [in] The Quasi-Hermitian input matrix (N x N).
n – [in] The number of vectors in Q (subspace size).
Q – [in] The input matrix of size (N x n), whose columns are the basis vectors for the subspace.
ldq – [in] The leading dimension of the matrix Q.
V – [out] The output matrix (N x n), which will store the result of the projection.
ldv – [in] The leading dimension of the matrix V.
ritzv – [out] The array of Ritz values, which contains the eigenvalue approximations.
A – [in] A temporary matrix used in intermediate calculations. If not provided, it is allocated internally.

template<typename T> void residuals(std::size_t N, T *H, std::size_t ldh, std::size_t eigen_nb, Base<T> *evals, T *evecs, std::size_t ldv, Base<T> *resids, T *V = nullptr)

Compute the residuals of eigenvectors for a given matrix and eigenvalues.

This function computes the residuals of the eigenvectors, which measure how well the eigenvectors satisfy the eigenvalue equation ( H \mathbf{v}_i = \lambda_i \mathbf{v}_i ). The residual for each eigenvector ( \mathbf{v}_i ) is defined as ( ||H \mathbf{v}_i - \lambda_i \mathbf{v}_i|| ), where ( \lambda_i ) is the corresponding eigenvalue. The computed residuals are stored in the resids array.

The function performs the following steps:

Computes the matrix-vector multiplication ( V = H \cdot E ), where E are the eigenvectors.
Subtracts the eigenvalue ( \lambda_i ) times the eigenvector from the result.
Computes the 2-norm of the residual for each eigenvector and stores it in the resids array.

Template Parameters:

T – Data type for the matrix and vectors (e.g., float, double, etc.).

Parameters:

N – [in] The number of rows and columns of the matrix H.
H – [in] The input matrix (N x N).
ldh – [in] The leading dimension of the matrix H.
eigen_nb – [in] The number of eigenvalues and eigenvectors.
evals – [in] The array of eigenvalues.
evecs – [in] The matrix of eigenvectors (N x eigen_nb), where each column is an eigenvector.
ldv – [in] The leading dimension of the matrix evecs.
resids – [out] The array that will store the computed residuals for each eigenvector.
V – [in] A temporary matrix used in intermediate calculations. If not provided, it is allocated internally.

template<typename T> bool checkSymmetryEasy(std::size_t N, T *H, std::size_t ldh)

Checks if a matrix is symmetric using a randomized approach.

This function checks the symmetry of a square matrix ( H ) by performing two matrix-vector multiplications:

It computes ( u = H \cdot v ), where ( v ) is a random vector.
It computes ( uT = H^T \cdot v ), where ( H^T ) is the transpose of ( H ). The matrix is considered symmetric if the vectors ( u ) and ( uT ) are the same, i.e., ( u = uT ).

This method is computationally efficient and uses random vectors to test symmetry with high probability. However, it is not a guarantee for exact symmetry due to numerical errors, but it can be a quick heuristic check.

Template Parameters:

T – Data type for the matrix (e.g., float, double).

Parameters:

N – [in] The size of the matrix (N x N).
H – [in] The matrix to be checked for symmetry (of size N x N).
ldh – [in] The leading dimension of the matrix H.

Returns:

true if the matrix is symmetric, false otherwise.

template<typename T> void symOrHermMatrix(char uplo, std::size_t N, T *H, std::size_t ldh)

Converts a matrix to its Hermitian or symmetric form based on the given uplo argument.

This function modifies the matrix ( H ) in-place such that it becomes symmetric or Hermitian, depending on the value of the uplo parameter.

If uplo is ‘U, the function converts the upper triangular part of the matrix to the Hermitian form, by setting the lower triangular part to the conjugate transpose of the upper part.
Ifuplois’L’`, the function converts the lower triangular part of the matrix to the Hermitian form, by setting the upper triangular part to the conjugate transpose of the lower part.

The function assumes that the matrix is square (N x N) and modifies the elements of the matrix in-place. The conjugation is done using the conjugate function.

Template Parameters:

T – Data type for the matrix (e.g., float, double, std::complex).

Parameters:

uplo – [in] A character indicating which part of the matrix to modify:
- ’Ufor the upper triangular part. -‘L’` for the lower triangular part.
N – [inout] The size of the matrix (N x N). The matrix is modified in-place.
H – [inout] The matrix to be modified. It is transformed into a symmetric or Hermitian matrix.
ldh – [in] The leading dimension of the matrix H.

template<typename T> void computeDiagonalAbsSum(std::size_t m, std::size_t n, T *A, std::size_t lda, Base<T> *sum)

Computes the sum of the absolute values of the diagonal elements of a matrix.

This function computes the sum of the absolute values of the diagonal elements of the given matrix ( A ). It iterates over the diagonal elements (i.e., elements where the row index equals the column index) and adds the absolute value of each diagonal element to the sum.

Template Parameters:

T – Data type for the matrix elements (e.g., float, double).

Parameters:

m – [in] The number of rows in the matrix ( A ).
n – [in] The number of columns in the matrix ( A ).
A – [in] The matrix of size ( m \times n ), where the diagonal elements are summed.
lda – [in] The leading dimension of the matrix ( A ), which is the number of elements between the start of one column and the start of the next column.
sum – [out] The resulting sum of the absolute values of the diagonal elements.

template<typename T> void shiftMatrixDiagonal(std::size_t m, std::size_t n, T *A, std::size_t lda, T shift)

Shifts the diagonal elements of a matrix by a given value.

This function adds a specified shift value to the diagonal elements of the given matrix ( A ). It modifies the matrix in place by adding the shift to each diagonal element (i.e., elements where the row index equals the column index).

Template Parameters:

T – Data type for the matrix elements (e.g., float, double).

Parameters:

m – [in] The number of rows in the matrix ( A ).
n – [in] The number of columns in the matrix ( A ).
A – [inout] The matrix of size ( m \times n ), whose diagonal elements are shifted.
lda – [in] The leading dimension of the matrix ( A ), which is the number of elements between the start of one column and the start of the next column.
shift – [in] The value to be added to each diagonal element.

template<typename T> void flipLowerHalfMatrixSign(std::size_t m, std::size_t n, T *A, std::size_t lda)

Flip the sign of the lower half part of the matrix.

This function toggles the sign of the lower half part of the matrix, i.e., the lower half part is multiplied by -1.0

Template Parameters:

T – Data type for the matrix elements (e.g., float, double).

Parameters:

m – [in] The number of rows in the matrix ( A ).
n – [in] The number of columns in the matrix ( A ).
A – [inout] The matrix of size ( m \times n ), whose diagonal elements are shifted.
lda – [in] The leading dimension of the matrix ( A ), which is the number of elements between the start of one column and the start of the next column.

template<typename T> void flipRightHalfMatrixSign(std::size_t m, std::size_t n, T *A, std::size_t lda)

Flip the sign of the right part of the matrix.

This function toggles the sign of the right part of the matrix, i.e., the right part is multiplied by -1.0

Template Parameters:

T – Data type for the matrix elements (e.g., float, double).

Parameters:

m – [in] The number of rows in the matrix ( A ).
n – [in] The number of columns in the matrix ( A ).
A – [inout] The matrix of size ( m \times n ), whose diagonal elements are shifted.
lda – [in] The leading dimension of the matrix ( A ), which is the number of elements between the start of one column and the start of the next column.

1.5.1.2. chase::linalg::internal::cuda

namespace cuda

Functions

template<typename T> void absTrace(chase::matrix::Matrix<T, chase::platform::GPU> &H, chase::Base<T> *absTrace, cudaStream_t *stream_ = nullptr)

Computes the absolute trace of a matrix on the GPU.

This function computes the absolute trace of a matrix on the GPU. The trace is calculated by summing the absolute values of the diagonal elements. It utilizes the absTrace_gpu function to launch the corresponding CUDA kernel based on the matrix type.

Note

This function uses SCOPED_NVTX_RANGE() to mark the execution range for profiling purposes using NVIDIA’s NVTX.

Template Parameters:

T – The type of elements in the matrix (e.g., float, double, complex types).

Parameters:

H – [in] The matrix whose absolute trace is to be computed. This matrix should reside in GPU memory.
absTrace – [out] The resulting absolute trace of the matrix.
stream_ – [in] The optional CUDA stream to be used for kernel execution. If not provided, the default stream is used.

template<typename T> int cholQR1(cublasHandle_t cublas_handle, cusolverDnHandle_t cusolver_handle, chase::matrix::Matrix<T, chase::platform::GPU> &V, T *workspace = nullptr, int lwork = 0, chase::matrix::Matrix<T, chase::platform::GPU> *A = nullptr)

Performs a Cholesky-based QR factorization with a degree of 1.

This function computes the Cholesky decomposition of the matrix ( A ), and uses it for a QR factorization of a given matrix ( V ) on a GPU. The process involves matrix operations using cuBLAS and cuSolver.

Template Parameters:

T – The data type of the matrix elements (e.g., float, double, or complex types).

Parameters:

cublas_handle – The cuBLAS handle for managing operations.
cusolver_handle – The cuSolver handle for solving linear systems and decompositions.
V – The input/output matrix ( V ) to be factored (on GPU).
workspace – Pointer to a workspace for cuSolver (default: nullptr).
lwork – The size of the workspace (default: 0).
A – Optional output matrix for the Cholesky factor (default: nullptr). If not provided, a new matrix is allocated.

Returns:

int An error code (0 indicates success, non-zero indicates failure).

template<typename T> int cholQR2(cublasHandle_t cublas_handle, cusolverDnHandle_t cusolver_handle, chase::matrix::Matrix<T, chase::platform::GPU> &V, T *workspace = nullptr, int lwork = 0, chase::matrix::Matrix<T, chase::platform::GPU> *A = nullptr)

Performs a Cholesky-based QR factorization with a degree of 2.

This function computes the Cholesky decomposition of the matrix ( A ), and uses it for a QR factorization of a given matrix ( V ) on a GPU. It involves several matrix operations using cuBLAS and cuSolver to perform the QR factorization.

Template Parameters:

T – The data type of the matrix elements (e.g., float, double, or complex types).

Parameters:

cublas_handle – The cuBLAS handle for managing operations.
cusolver_handle – The cuSolver handle for solving linear systems and decompositions.
V – The input/output matrix ( V ) to be factored (on GPU).
workspace – Pointer to a workspace for cuSolver (default: nullptr).
lwork – The size of the workspace (default: 0).
A – Optional output matrix for the Cholesky factor (default: nullptr). If not provided, a new matrix is allocated.

Returns:

int An error code (0 indicates success, non-zero indicates failure).

template<typename T> int shiftedcholQR2(cublasHandle_t cublas_handle, cusolverDnHandle_t cusolver_handle, chase::matrix::Matrix<T, chase::platform::GPU> &V, T *workspace = nullptr, int lwork = 0, chase::matrix::Matrix<T, chase::platform::GPU> *A = nullptr)

Performs a shifted Cholesky-based QR factorization with a degree of 2.

This function computes the shifted Cholesky decomposition for the matrix ( A ), and uses it to perform a QR factorization of the given matrix ( V ) on a GPU. The function performs additional operations with shifted matrix values to enhance stability.

Template Parameters:

T – The data type of the matrix elements (e.g., float, double, or complex types).

Parameters:

cublas_handle – The cuBLAS handle for managing operations.
cusolver_handle – The cuSolver handle for solving linear systems and decompositions.
V – The input/output matrix ( V ) to be factored (on GPU).
workspace – Pointer to a workspace for cuSolver (default: nullptr).
lwork – The size of the workspace (default: 0).
A – Optional output matrix for the Cholesky factor (default: nullptr). If not provided, a new matrix is allocated.

Returns:

int An error code (0 indicates success, non-zero indicates failure).

template<typename T> void houseHoulderQR(cusolverDnHandle_t cusolver_handle, chase::matrix::Matrix<T, chase::platform::GPU> &V, T *d_tau, int *devInfo, T *workspace = nullptr, int lwork = 0)

Performs the Householder QR decomposition on a matrix ( V ) using cuSolver and cuBLAS.

This function computes the QR decomposition of a matrix ( V ) using the Householder transformation. It performs the required operations using cuSolver and cuBLAS on the GPU.

Template Parameters:

T – The data type of the matrix elements (e.g., float, double, or complex types).

Parameters:

cusolver_handle – The cuSolver handle for solving linear systems and decompositions.
V – The input matrix ( V ) to be factored (on GPU).
d_tau – Pointer to the array storing Householder reflectors (on GPU).
devInfo – Pointer to an integer storing the result of the computation (on GPU).
workspace – Pointer to a workspace for cuSolver (default: nullptr).
lwork – The size of the workspace (default: 0).

template<typename T> void flipLowerHalfMatrixSign(chase::matrix::Matrix<T, chase::platform::GPU> *H, cudaStream_t *stream_ = nullptr)

Shifts the diagonal elements of a matrix by a specified value.

This function adds a scalar shift to the diagonal elements of the matrix H on the GPU. The operation is performed asynchronously using the provided CUDA stream, or the default stream if none is provided.

Note

The function modifies the matrix H in-place. The number of diagonal elements processed is determined by the minimum of the number of rows and columns of H.

Template Parameters:

T – The data type of the matrix elements (e.g., float, double).

Parameters:

H – The matrix on which the diagonal elements will be shifted. It is a matrix on the GPU.
shift – The value to be added to the diagonal elements of the matrix.
stream_ – Optional CUDA stream for asynchronous execution. If nullptr, the default stream is used.

template<typename T> void flipLowerHalfMatrixSign(std::size_t m, std::size_t n, T *H_data, std::size_t ldh, cudaStream_t *stream_ = nullptr)

template<typename T> void t_lacpy(char uplo, std::size_t m, std::size_t n, T *dA, std::size_t ldda, T *dB, std::size_t lddb, cudaStream_t *stream_ = nullptr)

Copies a matrix from one location to another on the device.

This function performs the copy of matrix A to matrix B based on the specified triangular part.

Template Parameters:

T – The data type of the matrix (e.g., float, double, complex types).

Parameters:

uplo – The triangular part of the matrix to copy:
- ’U’ for the upper triangular part
- ’L’ for the lower triangular part.
m – The number of rows of matrix A.
n – The number of columns of matrix A.
dA – Pointer to the input matrix A on the device.
ldda – The leading dimension of matrix A.
dB – Pointer to the output matrix B on the device.
lddb – The leading dimension of matrix B.
stream_ – Optional CUDA stream to execute the operation (default is nullptr).

template<typename T> void extractUpperTriangular(T *d_matrix, std::size_t ld, T *d_upperTriangular, std::size_t n, cudaStream_t *stream_ = nullptr)

Extracts the upper triangular part of a matrix and stores it in a separate matrix.

This function copies the upper triangular part of the input matrix d_matrix into d_upperTriangular.

Template Parameters:

T – The data type of the matrix (e.g., float, double, complex types).

Parameters:

d_matrix – Pointer to the input matrix on the device.
ld – The leading dimension of the input matrix.
d_upperTriangular – Pointer to the output matrix where the upper triangular part will be stored.
n – The number of rows/columns in the matrix.
stream_ – Optional CUDA stream to execute the operation (default is nullptr).

template<typename T> void unpackUpperTriangular(T *d_upperTriangular, std::size_t n, T *d_matrix, std::size_t ld, cudaStream_t *stream_ = nullptr)

Unpacks an upper triangular matrix from a compressed storage format into a full matrix.

This function reconstructs a full matrix from its upper triangular part stored in d_upperTriangular.

Template Parameters:

T – The data type of the matrix (e.g., float, double, complex types).

Parameters:

d_upperTriangular – Pointer to the input upper triangular matrix on the device.
n – The size of the matrix (number of rows/columns).
d_matrix – Pointer to the output full matrix on the device.
ld – The leading dimension of the output matrix.
stream_ – Optional CUDA stream to execute the operation (default is nullptr).

template<typename T> void lanczos(cublasHandle_t cublas_handle, std::size_t M, std::size_t numvec, chase::matrix::Matrix<T, chase::platform::GPU> *H, chase::matrix::Matrix<T, chase::platform::GPU> &V, chase::Base<T> *upperb, chase::Base<T> *ritzv, chase::Base<T> *Tau, chase::Base<T> *ritzV)

Performs the Lanczos algorithm for eigenvalue computation on a GPU using cuBLAS and CUDA.

This function performs the Lanczos algorithm on a matrix H (of size M), computing an orthonormal basis V and the tridiagonal matrix T for the eigenvalue problem. It also computes Ritz values and estimates of the upper bounds for the eigenvalues.

Template Parameters:

T – The data type of the matrix elements (e.g., float or double).

Parameters:

cublas_handle – The cuBLAS handle to perform linear algebra operations on the GPU.
M – The number of Lanczos iterations.
numvec – The number of runs of Lanczos.
H – The matrix of size M x M that is the input for the Lanczos algorithm.
V – The matrix of size M x numvec that holds the orthonormal basis vectors.
upperb – Output parameter that will hold the upper bound for the Ritz values.
ritzv – Output array to hold the Ritz values computed during the Lanczos algorithm.
Tau – Output array to store the Tau values from the eigenvalue decomposition.
ritzV – Output array for storing the Ritz vectors.

template<typename T> void lanczos(cublasHandle_t cublas_handle, std::size_t M, chase::matrix::Matrix<T, chase::platform::GPU> *H, chase::matrix::Matrix<T, chase::platform::GPU> &V, chase::Base<T> *upperb)

Performs the Lanczos algorithm on matrix H to compute the tridiagonal matrix and eigenvalues.

This version of the Lanczos algorithm is a simplified version that computes only the upper bound of the eigenvalue spectrum and does not compute eigenvectors. It operates similarly to the full Lanczos algorithm but omits the eigenvector computation step.

Note

The function modifies the input matrices H and V during the computation. It computes the tridiagonal matrix and updates the eigenvalues in ritzv, then stores the upper bound of the eigenvalues in the upperb pointer.

Template Parameters:

T – The data type of the matrix elements (e.g., float, double).

Parameters:

cublas_handle – cuBLAS handle used to perform matrix operations.
M – Number of Lanczos iterations.
H – The input matrix ( H ) (square matrix).
V – The input matrix ( V ) (Lanczos starting vector).
upperb – Pointer to store the upper bound of the computed eigenvalues.

template<typename T> void lanczos(cublasHandle_t cublas_handle, std::size_t M, std::size_t numvec, chase::matrix::QuasiHermitianMatrix<T, chase::platform::GPU> *H, chase::matrix::Matrix<T, chase::platform::GPU> &V, chase::Base<T> *upperb, chase::Base<T> *ritzv, chase::Base<T> *Tau, chase::Base<T> *ritzV)

Lanczos algorithm for eigenvalue computation of Quasi-Hermitian matrices.

This function performs the Lanczos algorithm, which is used to estimate the upper bound of spectra of the quasi-Hermitian matrix H, i.e., where H is pseudo-Hermitian and SH is Hermitian Positive Definite. The algorithm is iteratively applied to the matrix H, where the input matrix H is a square matrix of size N x N. The Lanczos algorithm builds an orthonormal basis of the Krylov subspace, and the resulting tridiagonal matrix is diagonalized using the t_stemr function. This pseudo-code of this implementation can be found in

Template Parameters:

T – The data type for the matrix elements (e.g., float, double).

Parameters:

cublas_handle – cuBLAS handle used to perform matrix operations.
M – The number of Lanczos iterations.
numvec – The number of runs of Lanczos.
N – The size of the input matrix H.
H – The input matrix for the Lanczos algorithm (of size N x N).
V – The input matrix used for storing vectors (of size N x numvec).
upperb – A pointer to the upper bound of the eigenvalue spectrum.
ritzv – A pointer to store the Ritz eigenvalues.
Tau – A pointer to store the computed Tau values.
ritzV – A pointer to store the Ritz eigenvectors.

template<typename T> void lanczos(cublasHandle_t cublas_handle, std::size_t M, chase::matrix::QuasiHermitianMatrix<T, chase::platform::GPU> *H, chase::matrix::Matrix<T, chase::platform::GPU> &V, chase::Base<T> *upperb)

Lanczos algorithm for eigenvalue computation of quasi-hermitian matrices (simplified version).

This version of the Lanczos algorithm is a simplified version that computes only the upper bound of the eigenvalue spectrum of quasi-hermitian matrices and does not computei eigenvectors. It operates similarly to the full Lanczos algorithm but omits the eigenvector computation step.

Template Parameters:

T – The data type for the matrix elements (e.g., float, double).

Parameters:

cublas_handle – cuBLAS handle used to perform matrix operations.
M – The number of Lanczos iterations.
H – The input quasi-hermitian matrix for the Lanczos algorithm (of size N x N).
V – The input matrix used for storing vectors (of size N x 1).
upperb – A pointer to the upper bound of the eigenvalue spectrum.

template<typename T> void init_random_vectors(T *v, std::size_t n, cudaStream_t *stream_ = nullptr)

Initializes a vector with random values using the CUDA random number generator.

This function initializes a vector v with random values sampled from a normal distribution. The random number generation is performed on the GPU using CUDA and curand, and it can be executed asynchronously with respect to the provided CUDA stream. If no stream is provided, the default stream is used.

Note

The function uses curandStatePhilox4_32_10_t for managing the random state and a fixed seed value. It allocates memory on the GPU for the random number generation state and frees it after the operation.

Template Parameters:

T – The data type of the vector elements (e.g., float, double).

Parameters:

v – Pointer to the vector on the GPU that will be populated with random values.
n – The number of elements in the vector v.
stream_ – Optional CUDA stream for asynchronous execution. If nullptr, the default stream is used.

template<typename T> void rayleighRitz(cublasHandle_t cublas_handle, cusolverDnHandle_t cusolver_handle, chase::matrix::Matrix<T, chase::platform::GPU> *H, chase::matrix::Matrix<T, chase::platform::GPU> &V1, chase::matrix::Matrix<T, chase::platform::GPU> &V2, chase::matrix::Matrix<chase::Base<T>, chase::platform::GPU> &ritzv, std::size_t offset, std::size_t subSize, int *devInfo, T *workspace = nullptr, int lwork_heevd = 0, chase::matrix::Matrix<T, chase::platform::GPU> *A = nullptr)

Perform the Rayleigh-Ritz procedure to compute eigenvalues and eigenvectors of a matrix.

The Rayleigh-Ritz method computes an approximation to the eigenvalues and eigenvectors of a matrix by projecting the matrix onto a subspace defined by a set of vectors (Q) and solving the eigenvalue problem for the reduced matrix. The computed Ritz values are stored in the ritzv array, and the resulting eigenvectors are stored in W.

The procedure performs the following steps:

Computes the matrix-vector multiplication: W = H * Q.
Computes A = W’ * Q, where W’ is the conjugate transpose of W.
Solves the eigenvalue problem for A using LAPACK’s heevd function, computing the Ritz values in ritzv.
Computes the final approximation to the eigenvectors by multiplying Q with the computed eigenvectors.

Template Parameters:

T – Data type for the matrix (e.g., float, double, etc.).

Parameters:

N – [in] The number of rows of the matrix H.
H – [in] The input matrix (N x N).
ldh – [in] The leading dimension of the matrix H.
n – [in] The number of vectors in Q (subspace size).
Q – [in] The input matrix of size (N x n), whose columns are the basis vectors for the subspace.
ldq – [in] The leading dimension of the matrix Q.
W – [out] The output matrix (N x n), which will store the result of the projection.
ldw – [in] The leading dimension of the matrix W.
ritzv – [out] The array of Ritz values, which contains the eigenvalue approximations.
A – [in] A temporary matrix used in intermediate calculations. If not provided, it is allocated internally.

template<typename T> void rayleighRitz(cublasHandle_t cublas_handle, cusolverDnHandle_t cusolver_handle, cusolverDnParams_t params, chase::matrix::QuasiHermitianMatrix<T, chase::platform::GPU> *H, chase::matrix::Matrix<T, chase::platform::GPU> &V1, chase::matrix::Matrix<T, chase::platform::GPU> &V2, chase::matrix::Matrix<chase::Base<T>, chase::platform::GPU> &ritzv, std::size_t offset, std::size_t subSize, int *devInfo, T *d_workspace = nullptr, int d_lwork = 0, T *h_workspace = nullptr, int h_lwork = 0, chase::matrix::Matrix<T, chase::platform::GPU> *A = nullptr)

Perform the Rayleigh-Ritz procedure to compute eigenvalues and eigenvectors of a Quasi-Hermitian matrix.

The Rayleigh-Ritz method computes an approximation to the eigenvalues and eigenvectors of a matrix by projecting the matrix onto a subspace defined by a set of vectors (Q) and solving the eigenvalue problem for the reduced matrix. The computed Ritz values are stored in the ritzv array, and the resulting eigenvectors are stored in W.

The procedure performs the following steps:

Computes the matrix-vector multiplication: W = H * Q.
Computes A = W’ * Q, where W’ is the conjugate transpose of W.
Solves the eigenvalue problem for A using LAPACK’s heevd function, computing the Ritz values in ritzv.
Computes the final approximation to the eigenvectors by multiplying Q with the computed eigenvectors.

Template Parameters:

T – Data type for the matrix (e.g., float, double, etc.).

Parameters:

N – [in] The number of rows of the matrix H.
H – [in] The input matrix (N x N).
ldh – [in] The leading dimension of the matrix H.
n – [in] The number of vectors in Q (subspace size).
Q – [in] The input matrix of size (N x n), whose columns are the basis vectors for the subspace.
ldq – [in] The leading dimension of the matrix Q.
W – [out] The output matrix (N x n), which will store the result of the projection.
ldw – [in] The leading dimension of the matrix W.
ritzv – [out] The array of Ritz values, which contains the eigenvalue approximations.
A – [in] A temporary matrix used in intermediate calculations. If not provided, it is allocated internally.

template<typename T> void rayleighRitz_v2(cublasHandle_t cublas_handle, cusolverDnHandle_t cusolver_handle, cusolverDnParams_t params, chase::matrix::QuasiHermitianMatrix<T, chase::platform::GPU> *H, chase::matrix::Matrix<T, chase::platform::GPU> &V1, chase::matrix::Matrix<T, chase::platform::GPU> &V2, chase::matrix::Matrix<chase::Base<T>, chase::platform::GPU> &ritzv, std::size_t offset, std::size_t subSize, int *devInfo, T *workspace = nullptr, int lwork = 0, chase::matrix::Matrix<T, chase::platform::GPU> *A = nullptr)

Perform the Rayleigh-Ritz procedure to compute eigenvalues and eigenvectors of a Quasi-Hermitian matrix.

The Rayleigh-Ritz method computes an approximation to the eigenvalues and eigenvectors of a matrix by projecting the matrix onto a subspace defined by a set of vectors (Q) and solving the eigenvalue problem for the reduced matrix. The computed Ritz values are stored in the ritzv array, and the resulting eigenvectors are stored in W.

The procedure performs the following steps:

Computes the matrix-vector multiplication: W = H * Q.
Computes A = W’ * Q, where W’ is the conjugate transpose of W.
Solves the eigenvalue problem for A using LAPACK’s heevd function, computing the Ritz values in ritzv.
Computes the final approximation to the eigenvectors by multiplying Q with the computed eigenvectors.

Template Parameters:

T – Data type for the matrix (e.g., float, double, etc.).

Parameters:

N – [in] The number of rows of the matrix H.
H – [in] The input matrix (N x N).
ldh – [in] The leading dimension of the matrix H.
n – [in] The number of vectors in Q (subspace size).
Q – [in] The input matrix of size (N x n), whose columns are the basis vectors for the subspace.
ldq – [in] The leading dimension of the matrix Q.
W – [out] The output matrix (N x n), which will store the result of the projection.
ldw – [in] The leading dimension of the matrix W.
ritzv – [out] The array of Ritz values, which contains the eigenvalue approximations.
A – [in] A temporary matrix used in intermediate calculations. If not provided, it is allocated internally.

template<typename T> void residuals(cublasHandle_t cublas_handle, chase::matrix::Matrix<T, chase::platform::GPU> *H, chase::matrix::Matrix<T, chase::platform::GPU> &V1, chase::Base<T> *d_ritzv, chase::Base<T> *d_resids, std::size_t offset, std::size_t subSize, chase::matrix::Matrix<T, chase::platform::GPU> *V2 = nullptr)

Compute the residuals of eigenvectors for a given matrix and eigenvalues.

This function computes the residuals of the eigenvectors, which measure how well the eigenvectors satisfy the eigenvalue equation ( H \mathbf{v}_i = \lambda_i \mathbf{v}_i ). The residual for each eigenvector ( \mathbf{v}_i ) is defined as ( ||H \mathbf{v}_i - \lambda_i \mathbf{v}_i|| ), where ( \lambda_i ) is the corresponding eigenvalue. The computed residuals are stored in the resids array.

The function performs the following steps:

Computes the matrix-vector multiplication ( V = H \cdot E ), where E are the eigenvectors.
Subtracts the eigenvalue ( \lambda_i ) times the eigenvector from the result.
Computes the 2-norm of the residual for each eigenvector and stores it in the resids array.

Template Parameters:

T – Data type for the matrix and vectors (e.g., float, double, etc.).

Parameters:

N – [in] The number of rows and columns of the matrix H.
H – [in] The input matrix (N x N).
ldh – [in] The leading dimension of the matrix H.
eigen_nb – [in] The number of eigenvalues and eigenvectors.
evals – [in] The array of eigenvalues.
evecs – [in] The matrix of eigenvectors (N x eigen_nb), where each column is an eigenvector.
ldv – [in] The leading dimension of the matrix evecs.
resids – [out] The array that will store the computed residuals for each eigenvector.
V – [in] A temporary matrix used in intermediate calculations. If not provided, it is allocated internally.

template<typename T> void shiftDiagonal(chase::matrix::Matrix<T, chase::platform::GPU> *H, chase::Base<T> shift, cudaStream_t *stream_ = nullptr)

Shifts the diagonal elements of a matrix by a specified value.

This function adds a scalar shift to the diagonal elements of the matrix H on the GPU. The operation is performed asynchronously using the provided CUDA stream, or the default stream if none is provided.

Note

The function modifies the matrix H in-place. The number of diagonal elements processed is determined by the minimum of the number of rows and columns of H.

Template Parameters:

T – The data type of the matrix elements (e.g., float, double).

Parameters:

H – The matrix on which the diagonal elements will be shifted. It is a matrix on the GPU.
shift – The value to be added to the diagonal elements of the matrix.
stream_ – Optional CUDA stream for asynchronous execution. If nullptr, the default stream is used.

template<typename T> void setDiagonal(chase::matrix::Matrix<T, chase::platform::GPU> *H, T value, cudaStream_t *stream_ = nullptr)

Set the diagonal elements of a matrix by a specified value.

This function replaces all the entries of the diagonal by a speciafied value. The operation is performed asynchronously using the provided CUDA stream, or the default stream if none is provided.

Note

The function modifies the matrix H in-place. The number of diagonal elements processed is determined by the minimum of the number of rows and columns of H.

Template Parameters:

T – The data type of the matrix elements (e.g., float, double).

Parameters:

H – The matrix on which the diagonal elements will be shifted. It is a matrix on the GPU.
value – The value for the diagonal entries of H.
stream_ – Optional CUDA stream for asynchronous execution. If nullptr, the default stream is used.

template<typename T> void scaleMatrixRows(chase::matrix::Matrix<T, chase::platform::GPU> *H, chase::Base<T> *values, cudaStream_t *stream_ = nullptr)

Scale the rows of a matrix by spciefied real values.

This function scales the rows of the matrix by the entries of values. The values should be already be in GPU memory. The operation is performed asynchronously using the provided CUDA stream, or the default stream if none is provided.

Note

The function modifies the matrix H in-place.

Template Parameters:

T – The data type of the matrix elements (e.g., float, double).

Parameters:

H – The matrix on which the diagonal elements will be shifted. It is a matrix on the GPU.
values – The real values for scaling the rows of H.
stream_ – Optional CUDA stream for asynchronous execution. If nullptr, the default stream is used.

template<typename T> void subtractInverseDiagonal(chase::matrix::Matrix<T, chase::platform::GPU> *H, chase::Base<T> coef, chase::Base<T> *new_diag, cudaStream_t *stream_ = nullptr)

Returns the inverse of a real coefficient subtracted by the real part of the diagonal of a matrix.

This function computes the inverse entries of a given real value subtracted by the real part of the diagonal of a matrix. The output vector, called new_diag, is a stored within the GPU memory. The operation is performed asynchronously using the provided CUDA stream, or the default stream if none is provided.

Note

The function modifies the matrix H in-place.

Template Parameters:

T – The data type of the matrix elements (e.g., float, double).

Parameters:

H – The matrix on which the diagonal elements will be shifted. It is a matrix on the GPU.
values – The real values for scaling the rows of H.
stream_ – Optional CUDA stream for asynchronous execution. If nullptr, the default stream is used.

template<typename T> void plusInverseDiagonal(chase::matrix::Matrix<T, chase::platform::GPU> *H, chase::Base<T> coef, chase::Base<T> *new_diag, cudaStream_t *stream_ = nullptr)

Returns the inverse of a real coefficient added to the real part of the diagonal of a matrix.

This function computes the inverse entries of a given real value added to the real part of the diagonal of a matrix. The output vector, called new_diag, is a stored within the GPU memory. The operation is performed asynchronously using the provided CUDA stream, or the default stream if none is provided.

Note

The function modifies the matrix H in-place.

Template Parameters:

T – The data type of the matrix elements (e.g., float, double).

Parameters:

H – The matrix on which the diagonal elements will be shifted. It is a matrix on the GPU.
values – The real values for scaling the rows of H.
stream_ – Optional CUDA stream for asynchronous execution. If nullptr, the default stream is used.

1.5.1.3. chase::linalg::internal::mpi

Note

The MPI kernel namespace contains functions for distributed CPU operations. These are internal implementation details. For user-facing documentation, refer to the implementation classes (Implementation Classes).

1.5.1.4. chase::linalg::internal::nccl

Note

The NCCL kernel namespace contains functions for distributed GPU operations using NCCL. These are internal implementation details. For user-facing documentation, refer to the implementation classes (Implementation Classes).

1.5.1.5. chase::linalg::internal::cuda_aware_mpi

Note

The CUDA-aware MPI kernel namespace contains functions for GPU operations with CUDA-aware MPI. These are internal implementation details. For user-facing documentation, refer to the implementation classes (Implementation Classes).