One-sided Jacobi algorithm

Posted March 7, 2025

Outline of the Algorithm

The Jacobi algorithm for computing the singular value decomposition (SVD) of a general matrix \(A \in \mathbb{R}^{m \times n}\) was proposed by Hestenes. The algorithm proceeds as follows:

\[A^{(0)} = A, \quad A^{(k+1)} = A^{(k)} J^{(k)}, \quad A^{(\infty)} = U \Sigma, \]

for \(k = 1, 2, \dots\), where \(J^{(k)}\) represents the Jacobi rotation applied at the \(k\)-th step to orthogonalize two distinct columns of \(A^{(k)}\). Each \(J^{(k)}\) is determined by the pivot position \((p, q) = (p(k), q(k))\), which depends on the chosen pivot strategy, such as classical or cyclic.

Given a pivot \((p, q)\), we first form the \((p, q)\) submatrix of \((A^{(k)})^T A^{(k)}\):

\[(A^{(k)})^T A^{(k)} ([p, q], [p, q]) = \begin{bmatrix} a & c \\ c & b \end{bmatrix}, \]

where

\[\label{eq:abc} \tag{1} a = (a^{(k)}_p)^T a^{(k)}_p, \quad b = (a^{(k)}_q)^T a^{(k)}_q, \quad c = (a^{(k)}_p)^T a^{(k)}_q. \]

The remaining steps mirror the procedure for the two-sided Jacobi algorithm:

\[\xi = \frac{b - a}{2c}, \quad t = \frac{\text{sign}(\xi)}{|\xi| + \sqrt{1 + \xi^2}}, \quad cs = \frac{1}{1 + t^2}, \quad sn = cs \times t, \]

and

\[V^{(k)} = \begin{bmatrix} cs & sn \\ -sn & cs \end{bmatrix}. \]

Suppose the algorithm converges after \(m\) steps, yielding \(A^{(m+1)}\). The column norms of \(A^{(m+1)}\) correspond to the singular values of \(A\), and \(U = A^{(m+1)} \Sigma^{-1}\) represents the left singular vector matrix. The right singular vector matrix \(V\) can be obtained either by accumulating \(V = J^{(0)} \cdots J^{(m)}\) or by solving the matrix equation \(AV = A^{(m)}\) for \(V\), as described here. Thus, \(A = U \Sigma V^T\) constitutes the SVD of \(A\).

Preconditioning

Drmač and Veselić published a paper that focused on pre-processing the matrix to accelerate the convergence of the one-sided Jacobi algorithm. Their approach includes:

Choosing between \(A\) and \(A^T\) for square matrices: Let \(A = DQ\), where \(D\) is diagonal and \(Q\) is orthogonal. Applying the one-sided Jacobi algorithm to \(A\) implicitly diagonalizes \(A^T A = Q^T D^2 Q\). However, applying it to \(A^T\) diagonalizes \(AA^T = D^2\), which is already diagonal. Therefore, one should choose \(A^{(0)}\) to maximize
\[|| \text{diag}((A^{(0)})^T A^{(0)}) ||_2 \]
or minimize the diagonal entropy of \((A^{(0)})^T A^{(0)}\).
Using QR factorization with rank-revealing column pivoting: Set \(A^{(0)} = R\). This redistributes the mass of the diagonals of \(R^T R\), positively impacting convergence. Additionally, for a rectangular \(m \times n\) matrix, a rank-revealing algorithm can reduce the problem to \(n \times n\).
Setting \(A^{(0)} = R^T\): This implicitly applies Rutishauser's LR diagonalization to \(R^T R \to R R^T\), which has a nontrivial diagonalizing effect.

Superior Accuracy

While the one-sided Jacobi algorithm typically struggles to match the performance of bidiagonalization methods like QR iteration and divide-and-conquer, it offers distinct advantages in accuracy for certain matrices. Perturbation theory for bidiagonalization methods shows that:

\[\frac{|\sigma_i - \hat{\sigma}_i|}{\sigma_i} \le O(\varepsilon) \kappa(A), \]

where \(\sigma_i\) and \(\hat{\sigma}_i\) are the singular values of \(A\) and \(A + \Delta A\), respectively. Here, \(\Delta A\) is a small perturbation arising during the algorithm's application, satisfying

\[\| \varDelta A \|_2 \le O(\varepsilon) \| A \|_2, \]

and \(\kappa(A) = \sigma_{\max}(A) / \sigma_{\min}(A)\) is the 2-norm condition number of \(A\). This inequality implies that small singular values may be highly inaccurate if \(\kappa(A)\) is large.

However, the one-sided Jacobi algorithm improves this bound. Let \(A = BD\), where \(B\) has unit 2-norm and

\[D = \text{diag}(|| a_i ||_2). \]

Using the notations in \eqref{eq:abc}, Demmel and Veselić proved the bound:

\[\frac{|\sigma_i - \hat{\sigma}_i|}{\sigma_i} \le O(\varepsilon) \kappa(B), \]

provided the algorithm terminates only if

\[|c| \le \text{tol} \times (ab), \]

where \(\text{tol} > 0\) is a convergence parameter. For strongly scaled matrices \(A\), we expect that \(\kappa(B) \ll \kappa(A)\).

Implementation

The one-sided Jacobi algorithm has been implemented in LAPACK.

References

Dongarra, Gates, Haidar, Kurzak, Luszczek, Tomov & Yamazaki, The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale, SIAM Rev. 60, 808–865, 2018
Drmač & Veselić, New Fast and Accurate Jacobi SVD Algorithm. I, SIAM J. Matrix Anal. Appl. 29, 1322–1342, 2008