This series of talks is part of the Algorithms and Uncertainty Boot Camp. Videos for each talk area will be available through the links above.
Problems are intractable when they “can be solved, but not fast enough for the solution to be usable” [Hopcroft, Motwani & Ullman, 2007]. NP-complete problems are commonly said to be intractable; however, the reality is more complex. All known algorithms for solving NP-complete problems require exponential time in the worst case; however, these algorithms nevertheless solve many problems of practical importance astoundingly quickly, and are hence relied upon in a broad range of applications. The propositional satisfiability problem (SAT) serves as a good example. The FCC's multi-billion dollar spectrum auctions operate by solving hundreds of thousands of SAT-encoded graph coloring problems, typically each involving tens of thousands of variables and millions of constraints. These instances can often be solved in seconds, even though the same solvers can be stymied by hand-crafted instances involving only hundreds of variables. Clearly, we could benefit from a more nuanced understanding of algorithm behaviour than is offered by asymptotic, worst-case analysis.
This tutorial is focused on the question most relevant to an end user: “How hard is it to solve a given family of problem instances, using the best available methods?” Formal, complexity-theoretic analysis of this question seems hopeless: the best available algorithms are highly complex (and, in some cases, only available in compiled form), and instance distributions representative of practical applications are heterogeneous and richly structured. For this reason, we turn to statistical, rather than combinatorial, analysis. The main claim of this tutorial is that rigorous statistical methods can characterize algorithm runtime with high levels of confidence. More specifically, this article surveys over a decade of research showing how to build empirical hardness models (EHMs) that, given a new problem instance, estimate the runtime of an algorithm in low-order polynomial time. We have shown that it is possible to build quite accurate models for different NP-complete problems (we have studied SAT, combinatorial auction winner determination, mixed integer programming, and the traveling salesman problem), distributions of problem instances (we have considered dozens), solvers (again, dozens). We have robustly found that even very succinct EHMs can achieve high accuracies, meaning that they describe simple relationships between instance characteristics and algorithm runtime. This makes our approach important even for theoretically inclined computer scientists who prefer proofs to experimental findings: EHMs can uncover new, simple relationships between instance characteristics and runtime, and thereby catalyze new theoretical work.
The tutorial will emphasize ways that EHMs contribute to our understanding of NP-complete problems; however, they are also useful ina variety of practical applications. Most straightforwardly, they can aid the distribution of problem instances across a cluster, or predict how long a run will take to complete. More interestingly, they can (1) be leveraged to automatically make benchmark distributions more challenging; (2) be used to combine a set of high-variance algorithms into an “algorithm portfolio” that outperforms its constituents; and (3) aid in the configuration (or “tuning”) of highly parameterized algorithms for good performance on given instance distributions. If time permits, I'll briefly discuss each of these applications. In particular, I'll explain how we leveraged ideas (2) and (3) to build the SAT solver portfolios that clear the FCC auctions.