Thursday, Oct. 3, 2024

Theory at the Institute and Beyond, October 2024

This illustration shows two hands typing on the keyboard of a laptop computer that is displaying mathematical notation on its screen. Books with scientific drawings and mathematical symbols on their covers lie to the sides of the laptop.

by Nikhil Srivastava (senior scientist, Simons Institute for the Theory of Computing)

Some results mark the end of a long quest, whereas others open up new worlds for exploration. Then there are works that change our perspective and make the familiar unfamiliar once more. I am delighted to tell you about some recent developments of all three kinds.

Quantum Gibbs samplers
It seemed like everyone and their mother was talking about Gibbs samplers during the Simons Institute’s program on Quantum Algorithms, Complexity, and Fault Tolerance. The subject also had its very own one-day workshop. It made me wonder whether this is what it was like in the 1980s, before the first TCS results about efficient sampling via rapid mixing of MCMC came out.

A classical Gibbs distribution is a special kind of probability distribution that appears in statistical mechanics, machine learning, and other areas. For a spin system with \(n\) particles, it has pdf

\[p(x)\propto \exp(-\beta H(x)), x\in \{-1,1\}^n\]

where \(H(x)\) is a function prescribing the energy of a configuration \(x\), known as the Hamiltonian. Typically, the Hamiltonian is itself a sum of “local terms” that only depend on a constant number of spins each, modeling the locality of physical interactions. A good example to keep in mind is the Ising model. The reason for the exponential form of the Gibbs distribution is that it maximizes the entropy over all probability distributions having a prescribed average energy.

There is a highly successful theory of sampling from classical Gibbs distributions. In particular, there is a canonical Markov chain for doing this known as the Glauber dynamics. The transitions of the Glauber dynamics are as follows: pick a particle at random, look at its neighbors, and resample the spin of the particle from the correct conditional distribution given its neighbors. The reason this dynamics is easy to implement is that the Gibbs distribution of a local Hamiltonian has a lot of conditional independence in it; in particular, the distribution of a particle’s spin is conditionally independent of the rest of the spins given the spins of its neighbors, which is known as the Markov property. Thus, locality of the Hamiltonian translates to locality of the distribution, which further yields locality of the dynamics. This locality makes it easy to check that the Glauber dynamics has the correct stationary distribution by checking the detailed balance condition \(p(i)p_{ij} = p(j)p_{ji}\) where \(P=(p_{ij})\) is the transition matrix of the Markov chain.

Proving rapid mixing — i.e., that \[||P^t x-p||_1\rightarrow 0\] rapidly from any initial distribution \(x\) — is typically much harder. At this point, there is a huge arsenal of methods for doing it, including functional inequalities, coupling arguments, and, most recently, the theory of spectral independence and stochastic localization. A conceptually satisfying feature of this theory is that it is able to show that the computational thresholds for rapid mixing of Glauber dynamics are related to certain physical thresholds intrinsic to the Hamiltonian.

Let us now understand the quantum analogue of the above setup.

In quantum mechanics, the role of probability distributions is played by trace-one positive semidefinite matrices, called density matrices or states. The quantum analogue of a Gibbs distribution is a Gibbs state, which for an \(n\)-qubit system is simply defined as the matrix exponential \(\exp(-\beta H)/Z\) where \(H\) is a Hamiltonian on \(n\) qubits and \(Z =\mathrm{tr}(\exp(-\beta H))\). The most natural Hamiltonians have some kind of locality in them. For a concrete example, keep in mind the transverse-field Ising model on \(n\) qubits, sometimes referred to as the “drosophila” of quantum spin systems, which has Hamiltonian

\[H = - \sum_{ij \in E} Z_i Z_j - J \sum_{i \in V} X_i,\]

where \(G = (V, E)\) is an undirected graph, \(Z_i\) and \(X_i\) are Pauli operators acting only on the \(i\)th qubit, and \(J\) is a parameter. Note that if all the Pauli operators were \(Z_i\)’s, \(H\) would be a diagonal matrix corresponding to the classical Ising model.

People care about quantum Gibbs states for at least two reasons. First, they are believed to occur widely in nature, yet it remains a mathematical mystery how nature prepares them efficiently. Second, the matrix exponential is a useful subroutine in quantum algorithms — e.g., for optimization.

The quantum generalization of a Markov chain is a quantum channel \(E\), a linear map that maps density matrices to density matrices and continues to have this property under the addition of ancilla qubits. We say that a channel mixes to a state \(\rho\)* if

\[|| E^t (\sigma)-\rho||_1\rightarrow 0,\]

for all initial states \(\sigma\), where the time parameter \(t\) can be discrete or continuous and \(|| \cdot||_1\) refers to the trace norm. If all the states involved are diagonal, this setup recovers the classical theory of discrete- and continuous-time Markov chains. As in the classical case, the goal is to show that mixing happens rapidly, in time that is polynomial in the number of qubits.

As in the classical case, whether and how fast mixing happens is expected to depend on the Hamiltonian \(H\) and the inverse temperature \(\beta\). In particular, we do not expect to have efficiently implementable (in the sense of quantum circuits) rapidly mixing channels for all local Hamiltonians and all \(\beta=\mathrm{poly}(n)\), as this would imply that we can approximate the ground state (i.e., bottom eigenvector) of any local Hamiltonian \(H\) efficiently, which is a QMA-hard problem.

In the case \(\beta=0\), the Gibbs state is just the identity matrix, and then the question of designing an efficient channel that mixes rapidly to the Gibbs state is the same as the question of constructing efficient quantum expanders, which are known to exist.

If the local terms in the Hamiltonian \(H\) commute, then several nice things happen because the exponential splits as a product of exponentials of local terms. In this setting, the Gibbs state has the analogue of the Markov property, and it is possible to define quantum Glauber dynamics analogously and to check that it makes only local transitions, touching a few qubits at a time. Proving mixing is more challenging than in the classical case, however, because the channel has to mix not just for diagonal initial states but for all initial states. This has, however, been done in an impressive body of work over the last 10 years, for instance in (Kastoryano-Brandão’14).

The situation is much more difficult when the local terms do not commute and \(\beta>0\), since the exponential does not split as a product. This is not merely a technical inconvenience: in the noncommuting case, the Markov property truly fails for the Gibbs state. It is conjectured that it holds approximately in certain regimes, but this conjecture remains unproven despite significant recent progress. Consequently, it is not clear how to define an efficiently implementable channel that converges to the Gibbs state, regardless of mixing time.

Until last year, there was no example of an efficiently implementable channel that converges to the Gibbs state of a noncommuting local Hamiltonian with \(\beta>0\), even ignoring the mixing time. Chen, Kastoryano, and Gilyén gave the first example of such a channel in November 2023, inspired by a physically inspired dynamics called the Davies generator from the 1970s. Refinements of this result were given by several participants of the Institute’s Quantum program and their collaborators.

Still, there was not a single example of a noncommuting local Hamiltonian for which there was a rapidly mixing local channel when \(\beta>0\), and in fact, there was no efficient quantum algorithm for preparing such states other than in the case of 1D lattices. In the last week of April, two independent results gave solutions to this problem for small constant \(\beta\).

Rouzé, França, and Alhambra showed that the Chen et al. dynamics has a spectral gap in the regime of a small enough constant \(\beta\), when \(H\) is a local Hamiltonian. They proved this by a homotopy argument showing that the gap is preserved as \(\beta\) varies from 0 to some small value. The proof, interestingly, required relating the spectral gap of the quantum channel (a “dynamic” object) back to that of a local Hamiltonian on a larger space (a “static” object).

Quite dramatically, in concurrent work Bakshi, Liu, Moitra, and Tang gave a classical algorithm for preparing Gibbs states of local Hamiltonians in the regime of sufficiently small \(\beta\). Their proof is reminiscent of the classic sampling-to-counting reduction from the 1980s. Though they were initially interested in the algorithmic result, their proof showed something striking: that there is a \(\beta\) (depending only on the locality and not the number of qubits) up to which the Gibbs state has zero entanglement — i.e., is separable. See this recent Quanta Magazine article for more about this story.

There is still a lot to be done in this area. Can we relate computational and physical phase transitions, as in the classical case? Can we explain how nature efficiently prepares many-body Gibbs states? Can we use quantum Gibbs states to demonstrate quantum advantage? (The Quantum program also witnessed some exciting progress in the last direction.) It’s exciting to know that what will clearly be a whole research landscape is just coming into view.

A new proof of Friedman’s theorem
It is difficult to overstate the usefulness of expansion as a concept in theoretical computer science. A landmark result about spectral expansion of graphs is Friedman’s proof of Alon’s conjecture, which says that a random \(d\)-regular graph is essentially as good a spectral expander as possible — specifically, that its second adjacency eigenvalue is typically at most \(2\sqrt{d-1}+o_n(1)\), almost matching the Alon-Boppana lower bound of \(2\sqrt{d-1}-o_n(1)\). Friedman’s original proof via the trace method was 100-plus pages long and relied on very delicate combinatorics. The proof was significantly simplified by Bordenave but remained unforgiving in that each step of the calculation had to be done very precisely to get the result, which seemed far from inevitable.

Very recently, Chen, Garza-Vargas, Tropp, and van Handel gave a beautiful new proof of Friedman’s theorem. The proof is less than 10 pages long and uses mainly soft arguments, as well as some gentle combinatorics. It additionally gives several bonuses, such as a bound of \(2\sqrt{d-1}+1/\mathrm{poly}(n)\) instead of \(2\sqrt{d-1}+1/\log^c(n)\). The paper also gives a short proof of a more general result of Bordenave-Collins on strong asymptotic freeness of random permutations, which has many other applications (e.g., to random covers of graphs and surfaces). This is a dramatic simplification that will surely have many applications, given the many contexts in which Friedman’s theorem is relevant. If you were curious about random matrix theory but afraid of calculating traces (unlike the current generation of TCS graduate students, who seem to do it without breaking a sweat), this might be a good entry point for you into the subject.

A key new idea in the proof is borrowed from quantum query complexity, namely the “polynomial method.” There, the observation is that the acceptance probability of a quantum query algorithm is a multivariate polynomial in the input, whose degree is roughly the number of queries. It is then shown that a univariate symmetrization of this polynomial must vary rapidly and yet remain bounded over a certain interval for the algorithm to be correct. On the other hand, the “other” Markov inequality (same Markov, of the chains) states that a polynomial of degree \(q\) on [-1, 1] must satisfy

\[\sup_{x\in [-1,1]} |f’(x)| \le q^2 \sup_{x\in [-1,1]} |f(x)|.\]

This implies a lower bound on the degree and thereby the number of queries.

In the setting of a random \(d\)-regular graph on \(N\) vertices, it turns out that the expected normalized traces of fixed-degree polynomials of the adjacency matrix \(A_N\) enjoy a similar uniformity. Specifically, for any polynomial \(h\) of degree \(q\), \((1/N) \mathbb{E}\)tr\((h(A_N))\) is a rational function of degree \(q\) in the parameter \(1/N\) where \(N\) is the size of the matrix. Expanding these rational functions as a power series in \(1/N\), the first few terms have relatively simple combinatorial expressions that can be controlled efficiently. The Markov inequality then implies that when the polynomial \(h\) is bounded in an appropriate interval, the corresponding rational function must have small derivatives there and so this power series can be truncated with very good control on the error term, yielding a precise comparison between the expected trace and its \(N\rightarrow\infty\) limit for such bounded polynomial test functions \(h\), which is used to prove the result.

I had to read the proof several times to feel like it wasn’t magic. At a conceptual level, the new thing is that it exploits the uniformity of the sequence of random matrices model being considered — namely, that the sequence of matrices \(A_N\) converges in a certain sense. The proof shows that \(A_N\) is close to this limit instead of trying to show that it is “small” one value of \(N\) at a time. At a technical level, the shift from using low-degree test functions (i.e., computing \(\mathbb{E}\)tr\((A_n^p)\), an algebraic notion) to using bounded test functions \(h\) (an analytic notion) is crucial for this comparison to work.

Interestingly, the same idea played a key role in concurrent work by Chen et al. on efficient constructions of pseudorandom unitaries (see also the independent work of Metger-Poremba-Sinha-Yuen), which was another major focus of the Quantum program, with its own one-day workshop, though the focus there was more on computational pseudorandomness.

Breakthroughs on PCPs
The probabilistically checkable proof (PCP) theorem is one of the great achievements of theoretical computer science. It states that there is a way to efficiently rewrite formal mathematical proofs in a format that can be “spot-checked” — i.e., there is a randomized algorithm that reads a randomly chosen constant-sized portion of the rewritten proof and determines whether it is correct, with 99% accuracy. This theorem makes contact with an amazing number of topics, including hardness of approximation, property testing, error-correcting codes, blockchain, certificates of quantum entanglement, and more. It’s too bad that the homework assignments in my 500-person discrete math class are not encoded this way.

Two of the best paper awards at STOC this summer were for work on PCPs, and can interestingly be viewed as answers to the following question: How much can we improve the parameters of a PCP if we let the alphabet size be large?

To make sense of this, let’s recall the statement of the PCP theorem. It says that there is a polynomial-time mapping from 3-SAT formulas \(\phi\) of size \(n\) to strings \(\psi\in\Sigma^*\) over an alphabet \(\Sigma\) with alphabet size \(|\Sigma|=R\), and a polynomial-time verifier which:

1. Uses \(b=O(\log(n))\) bits of randomness and queries \(q=O(1)\) symbols of \(\psi\).

2. Has completeness 1 — i.e., it accepts with probability at least 1 if \(\phi\) is satisfiable.

3. Has soundness 0.99 — i.e., it accepts with probability at most 0.99 if \(\phi\) is unsatisfiable.

The original proof yielded an alphabet size of \(R = 2\). Since then, there have been dozens of works constructing PCPs with different parameter settings motivated by various applications. Intuitively, it seems that increasing \(R\) should allow one to improve the other parameters since the verifier reads more bits of the proof. This intuition is formalized by the parallel repetition theorem of Raz, which says that it is possible to obtain soundness \(1/R^c\) with an alphabet of size \(R\), for some constant \(c < 1\).

Minzer and Zheng settled a classic problem in the subject: What is the optimal trade-off between the soundness of a PCP and its alphabet size, fixing the number of queries to be 2? They show that there are PCPs with soundness \(1/R^{1-o(1)}\) and completeness 0.99, which is optimal. This implies sharp inapproximability results for some very natural problems, such as maximizing a quadratic form over \(\{-1,1\}^n\), which has an \(O(\log(n))\) approximation algorithm. The proof employs strikingly many concepts from across TCS: property testing, list decoding, small-set expansion, and more. I find it amazing that these bounds line up, given the totally different techniques that go into the proofs of the upper and lower bounds.

Guruswami, Lin, Ren, Sun, and Wu solved a long-standing folklore conjecture in parameterized complexity. One of the original payoffs of the PCP theorem was a proof that it is NP hard to approximate clique. But what if one wants to approximate k-clique? There is an \(n^k\) brute-force algorithm for this problem. Can we rule out an \(f(k)\mathrm{poly}(n)\) algorithm? Lin showed in 2021 that assuming the exponential time hypothesis (ETH) — i.e., that 3-SAT does not have \(\exp(o(n))\) time algorithms — no such algorithm exists. The Guruswami et al. paper gives a unified proof of Lin’s theorem and a slew of other such inapproximability results by proving the analogue of the PCP theorem for parameterized complexity theory.

Remarkably, the technical core of the proof can be viewed as a new PCP in an extreme regime of parameters: for every \(k\), there are 2-query PCPs for 3-SAT that are computable in \(\exp(O(n/k))\) time and have length only \(g(k)\) symbols over an alphabet of size \(R = \exp(n/k)\) for some function \(g(k)\) — i.e., the number of bits of randomness used depends only on \(k\) and not on \(n\). The punch line is that if there were an \(f(k)\mathrm{poly}(R)\) algorithm for deciding such a PCP, it would yield a subexponential-time algorithm for 3-SAT by taking \(k\) superconstant in \(n\). This beautiful work paves a new conceptual path between research areas and opens the door to a more refined study of inapproximability for parameterized problems.

Theory at the Institute and Beyond, October 2024

Related stories