Monday, 3 November 2014

Finding eigenvalues of sparse matrices in C++

Just a quick note about the headaches I've been having when trying to find the eigenvalues of sparse matrices in my C++ simulation programs. Essentially, I just want to solve a problem of the type: \[\mathsf{H}\mathbf{x} = E\mathbf{x}\], where \(\mathsf{H}\) is a square matrix, \(\mathbf{x}\) is an eigenvector of the matrix and \(E\) is the corresponding eigenvalue. I'm handling quite large matrices (say, \(5000\times 5000\) elements) so a general eigenvalue solver takes an extremely long time to solve. In fact, the matrices I'm dealing with are tridiagonal (i.e., only the diagonal and the bands either side of it contain non-zero elements) and this, in theory at least, makes it possible to solve the problems much more rapidly. However, my matrices are not symmetrical, and this causes some difficulties! I've found three possible library approaches for solving this, none of which are ideal:

LAPACK

LAPACK provides (slow) solvers for general eigenvalue problems, and very fast solvers for symmetric tridiagonal systems. Unfortunately, there doesn't appear to be a solver for non-symmetric tridiagonal systems. LAPACK is a FORTRAN library, but since version 3.4, there are some convenient C/C++ bindings that can be used.

ARPACK

ARPACK seems to be the best/fastest sparse-matrix solver around, but using the raw API looks extremely painful (you need to manually iterate the solver). I did find a C++ wrapper (ARPACK++), but development seems to have been abandoned, the API is still quite arcane, and documentation is pretty sparse. There's an excellent Python wrapper as part of SciPy... now, if someone could do this in C++, I'd be very happy!

Armadillo

The Armadillo library has, in fact, started wrapping ARPACK functionality for C++, and the API is very user-friendly. The only problem is that the sparse-matrix algorithm converges first on the eigenvalues with largest magnitude... I want the smallest ones! It is possible to request the smallest magnitude solutions, but the algorithm fails to converge within Armadillo's hard-coded iteration limit for many cases. ARPACK itself provides some very handy ways around this problem, which haven't yet been fully implemented in the Armadillo wrapper. These are:
  • Increase the tolerance on precision of the solution. This has been implemented in Armadillo, but it's not an ideal solution since the eigenvalues are found to a lower precision.
  • Increase the maximum number of iterations. This essentially gives ARPACK more time to find the solution. It's not yet available in Armadillo, but it's a bit of a brute-force approach, because obviously it can take a very long time. I don't know the details of the algorithm, but I'd be concerned about the stability... potentially, I guess it could be possible to get stuck in an oscillation or to hit a divergent condition. In this case, increasing the maximum number of iterations wouldn't help!
  • Use a shift-and-invert algorithm. This approach preconditions the eigenvalue problem such that the eigenvalues are transformed \[E\to\frac{1}{E-\sigma}\], where \(\sigma\) is a "pivot" point near to the desired solutions. In my case, \(\sigma = 0\) is a decent choice, because I'm actually looking for the energies of electrons close to the bottom of an energy band. This approach transforms the problem so that the algorithm works in its "preferred" mode of finding the largest eigenvalues first.
I exchanged a couple of emails with the Armadillo developers a few months ago, with a view to getting the latter functionality into a new version. They were very helpful, and hopefully some day I will get round to implementing the feature in their wrapper and sending them a patch. Until then, I'm stuck with either using very slow general eigenvalue solvers, or very complex APIs!