The Simons Institute for the Theory of Computing will swing into high gear in August, with the start of its first two full-semester programs, “Real Analysis in Computer Science” and “Theoretical Foundations of Big Data Analysis.” The two programs exemplify the Institute’s dual mission: to address the central problems of theoretical computer science, and to expand the discipline’s boundaries by exploring connections with other areas of science.
The Real Analysis program hinges on the surprising effectiveness of continuous, analytic tools such as Fourier analysis for solving discrete problems in fields such as machine learning, hardness of approximation, and computational voting theory. This bridge between continuous and discrete approaches is founded upon a seminal paper from 1988 by one of the program’s organizers, Gil Kalai of Hebrew University, together with Jeff Kahn of Rutgers University and Nati Linial of Hebrew University who will also be visiting the program, which used Fourier analysis to study properties of Boolean functions—functions that assign a value of 0 or 1 to each string of bits of a given length.
Boolean functions are at the heart of a host of computational problems—they can be used, for example, to represent how a given voting scheme chooses a winner, or to distinguish between examples and non-examples of a concept in machine learning. Using the Fourier representation of a Boolean function, researchers can study, for instance, which voters have the most influence in a particular voting scheme, or how stable the scheme’s outcomes are if a few votes get misrecorded. Computer scientists have started using tools from real analysis to tackle such problems, and also, in the past decade, to delve into a class of problems that appears to be hard to solve not only exactly, but even approximately.
While the Real Analysis program is firmly rooted in the core of theoretical computer science, the Big Data program aims to turn the power of theoretical computer science outward, to address a pressing societal need. As applications such as genome sequencing, environmental sensing, and Internet activity generate an ever-swelling flood of data, the task of extracting meaning from this data constitutes an unprecedented computational challenge—but also an unprecedented opportunity to discover new truths about society, health and science.
Computer scientists and statisticians have come up with a variety of tools for homing in on the most salient features of a data set, such as cluster analysis and dimension reduction techniques. So far, however, these tools have been developed and applied in a largely ad hoc fashion.
The goal of the Institute’s program is to develop the theoretical underpinnings of big data analysis—the common principles that underlie many different large data problems. The program will examine, for instance, what the right model should be for cloud computing, in which many different people are accessing and storing a shared, distributed dataset; how to compress data into a more compact form that can be more easily manipulated, without losing its salient features; and how to filter out reliable conclusions from unreliable ones when studying data gathered under uncontrolled circumstances. The program will also feature a workshop on differential privacy, an approach to releasing large-scale statistical information about a dataset without compromising the privacy of its participants.
The two programs, which will each welcome about 35 to 40 long-term visitors, will kick off with “boot camp” workshops to bring researchers up to speed on important developments in the two fields. Each program will hold three additional workshops sprinkled throughout the semester, as well as open lectures aimed at a wider scientific audience.