1.1.4. Performance Classes

1.1.4.1. chase::ChasePerfData

template<class T> class ChasePerfData

ChASE class for collecting data relative to FLOPs, timings, etc.

The ChasePerfData class collects and handles information relative to the execution of the eigensolver. It collects information about

Number of subspace iterations
Number of filtered vectors
Timings of each main algorithmic procedure (Lanczos, Filter, etc.)
Number of FLOPs executed

The number of iterations and filtered vectors can be used to monitor the behavior of the algorithm as it attempts to converge all the desired eigenpairs. The timings and number of FLOPs are use to measure performance, especially parallel performance. The timings are stored in a vector of objects derived by the class template std::chrono::duration.

Public Types

enum TimePtrs

Values:

enumerator All

enumerator InitVecs

enumerator Lanczos

enumerator Filter

enumerator Qr

enumerator Rr

enumerator Resids_Locking

Public Functions

inline ChasePerfData(int matrix_type = 0)

inline void Reset()

inline std::size_t get_iter_count()

Returns the number of total subspace iterations executed by ChASE.

The S in ChASE stands for Subspace iteration. The main engine under the hood of ChASE is a loop enveloping all the main routines executed by the code. Because of this structure, ChASE is a truly iterative algorithm based on subspace filtering. Counting the number of times such a loop is repeated gives a measure of the effectiveness of the algorithm and it is usually a non-linear function of the spectral distribution. For example, when using the flag approximate_ = 'true' to solve a sequence of eigenproblems, one can observe that the number of subspace iteration decreases as a function of sequences index.

Returns:: The total number of subspace iterations.

inline std::size_t get_filtered_vecs()

Returns the cumulative number of times each column vector is filtered by one degree.

The most computationally expensive routine of ChASE is the Chebyshev filter. Within the filter a matrix of vectors V is filtered with a varying degree each time a subspace iteration is executed. This counter return the total number of times each vector in V goes through a filtering step. For instance, when the flag optim_ = false, such a number roughly corresponds to rank(V) x degree x iter_count. When the optim_ is set to true such a calculation is quite more complicated. Roughly speaking, this counter is useful to monitor the convergence ration of the filtered vectors and together with get_iter_count convey the effectiveness of the algorithm.

Returns:: Cumulative number of filtered vectors.

inline std::vector<std::chrono::duration<double>> get_timings()

inline std::size_t get_flops(std::size_t N = 0, std::size_t lanczosIter = 0, std::size_t numLanczos = 0)

Returns the total number of FLOPs executed by ChASE.

When measuring performance, it is fundamental to understand how many operations a routine executes against the total time to solutions. This counter returns the total amount of operations executed by ChASE and can be used to extract the performance of ChASE and compare it with theoretical peak performance of the platform where the code is executed.

Parameters:

N – Size of the eigenproblem matrix
lanczosIter – Number of Lanczos Iterations
numLanczos – Number of Lanczos Vectors

Returns:

The total number of operations executed by ChASE.

inline std::size_t get_filter_flops(std::size_t N)

Returns the total number of FLOPs of the Chebyshev filter.

Similar to get_flops, this counter return the total number of operations executed by the Chebyshev filter alone. Since the filter is the routine that executes, on average, 80% of the total FLOPs of ChASE, this counter is a good indicator of the performance of the entire algorithm. Because the filter executes almost exclusively BLAS-3 operations, this counter is quite useful to monitor how well the filter is close to the peak performance of the platform where ChASE is executed. This can be quite useful to fine tune the use of the computational resources used.

Parameters:: N – Size of the eigenproblem matrix
Returns:: The total number of operations executed by the polynomial filter.

inline std::size_t get_lanczos_flops(std::size_t N, std::size_t lanczosIter, std::size_t numLanczos)

Returns the average number of FLOPs of the Lanczos Algorithm.

Similar to get_flops, this counter return the average number of operations executed by the Lanczos algorithm alone. We approximate the total number by the flops for the GEEM operations + an estimation for the stemr algorithm.

Parameters:

N – Size of the eigenproblem matrix
lanczosIter – Number of Lanczos Iterations
numLanczos – Number of Lanczos Vectors

Returns:

The average number of operations executed by Lanczos filter.

inline void set_nprocs(int nProcs)

inline void add_iter_count(std::size_t add)

inline void add_iter_blocksize(std::size_t nevex)

inline void add_filtered_vecs(std::size_t add)

inline void start_clock(TimePtrs t)

inline void set_early_locked_residuals(std::vector<chase::Base<T>> early_locked_residuals)

inline void end_clock(TimePtrs t)

inline auto getTimePoint() -> TimePointType

inline void print(std::size_t N = 0, std::size_t lanczosIter = 0, std::size_t numLanczos = 0)

Print function outputting counters and timings for all routines.

It prints by default ( for N = 0) in the order,

size of the eigenproblem
total number of subspace iterations executed
total number of filtered vectors
time-to-solution of the following 6 main sections of the ChASE algorithm:
1. Total time-to-solution
2. Estimates of the spectral bounds based on Lanczos,
3. Chebyshev filter,
4. QR decomposition,
5. Raleygh-Ritz procedure including the solution of the reduced dense problem,
6. Computation of the eigenpairs residuals

When the parameter N is set to be a number else than zero, the function returns total FLOPs and filter FLOPs, respectively.

Parameters:: N – Control parameter. By default equal to 0.

1.1.4.2. chase::PerformanceDecoratorChase

template<class T> class PerformanceDecoratorChase : public chase::ChaseBase<T>

A derived class used to extract performance and configuration data.

This is a class derived from the Chase class which plays the role of interface for the kernels used by the library. All members of the Chase class are virtual functions. These functions are re-implemented in the PerformanceDecoratorChase class. All derived members that provide an interface to computational kernels are reimplemented by decorating the original function with time pointers which are members of the ChasePerfData class. All derived members that provide an interface to input or output data are called without any specific decoration. In addition to the virtual member of the Chase class, the PerformanceDecoratorChase class has also among its public members a reference to an object of type ChasePerfData. When using Chase to solve an eigenvalue problem, the members of the PerformanceDecoratorChase are called instead of the virtual functions members of the Chase class. In this way, all parameters and counters are automatically invoked and returned in the correct order.