Posted March 8, 2025
The trace of a matrix \(A \in \mathbb{R}^{n \times n}\) is the sum of its diagonal entries:
\[\text{tr}(A) = \sum_{i=1}^n a_{ii} = \sum_{i=1}^n e_i^T A e_i, \]where \(e_i\) is the \(i\)-th column of the identity matrix. Many scientific computing and machine learning applications require estimating the trace of a square linear operator \(A\) that is represented implicitly. For example, the matrix-vector product \(u \mapsto Au\) might represent the solution of a linear differential equation with initial condition \(u\). In such cases, we aim to avoid applying this procedure \(n\) times.
Randomized trace estimation is based on the insight that it is straightforward to construct a random variable whose expectation equals the trace of the input matrix.
Lemma. Let \(\omega \in \mathbb{C}^n\) be a random test vector that is isotropic, i.e., \(\mathbb{E}[\omega\omega^*] = I\). Then,
\[\mathbb{E}[X] = \text{tr}(A) \quad \text{where } X = \omega^*(A\omega). \]Proof.
Since \(\mathbb{E}[\omega\omega^*] = I\), we have \(\mathbb{E}[\omega_i \omega_j] = \delta_{ij}\), where \(\delta_{ij}\) is the Kronecker delta. Therefore,
which follows from the linearity of expectation. \(\quad \Box\)
In other words, the random variable \(X\) is an unbiased estimator of the trace. This idea dates back to Girard, who suggested drawing \(\omega\) from a uniform distribution over the \(\ell_2\) hypersphere with radius \(\sqrt{n}\). Later, Hutchinson proposed using Rademacher random vectors.
A single sample of \(X\) is rarely enough because its variance, \(\mathbb{V}[X]\), tends to be large. The most common mechanism for reducing the variance is to average \(k\) independent copies of \(X\), a technique known as the Monte Carlo Method. For \(k \in \mathbb{N}\), define
\[\overline{X}_k = \frac{1}{k} \sum_{i=1}^k X_i, \]where \(X_i \sim X\) are independent and identically distributed (i.i.d.). By linearity, \(\overline{X}_k\) is also an unbiased estimator of the trace. More importantly,
\[\mathbb{E}[\overline{X}_k] = \text{tr}(A), \quad \mathbb{V}[\overline{X}_k] = \frac{\mathbb{V}[X]}{k}. \]Hutch++. This method is very effective in reducing the variance of the Girard–Hutchinson estimator.XTrace algorithm can be viewed as a symmetrized version of Hutch++.