Programs | Fall 2013Theoretical Foundations of Big Data Analysis
22 August to 20 December, 2013
We live in an era of "Big Data": science, engineering and technology are producing increasingly large data streams, with petabyte and exabyte scales becoming increasingly common. In scientific fields such data arise in part because tests of standard theories increasingly focus on extreme physical conditions (cf., particle physics) and in part because science has become increasingly exploratory (cf., astronomy and genomics). In commerce, massive data arise because so much of human activity is now online, and because business models aim to provide services that are increasingly personalized.
The Big Data phenomenon presents opportunities and perils. On the optimistic side of the coin, massive data may amplify the inferential power of algorithms that have been shown to be successful on modest-sized data sets. The challenge is to develop the theoretical principles needed to scale inference and learning algorithms to massive, even arbitrary, scale. On the pessimistic side of the coin, massive data may amplify the error rates that are part and parcel of any inferential algorithm. The challenge is to control such errors even in the face of the heterogeneity and uncontrolled sampling processes underlying many massive data sets. Another major issue is that Big Data problems often come with time constraints, where a high-quality answer that is obtained slowly can be less useful than a medium-quality answer that is obtained quickly. Overall we have a problem in which the classical resources of the theory of computation—e.g., time, space and energy—trade off in complex ways with the data resource.
Various aspects of this general problem are being faced in the theory of computation, statistics and related disciplines—where topics such as dimension reduction, distributed optimization, Monte Carlo sampling, compressed sampling, low-rank matrix factorization, streaming and hardness of approximation are of clear relevance—but the general problem remains untackled. This program will bring together experts from these areas with the aim of laying the theoretical foundations of the emerging field of Big Data.
Michael Jordan (UC Berkeley; chair), Stephen Boyd (Stanford), Peter Bühlmann (ETH Zürich), Ravi Kannan (Microsoft Research), Michael Mahoney (Stanford), S. Muthu Muthukrishnan (Rutgers University and Microsoft Research).
Long-Term Participants (in addition to Organizers):
Ivona Bezáková (Rochester Inst of Technology), Peter Bickel (UC Berkeley), Josh Bloom (UC Berkeley), Sebastien Bubeck (Princeton), Aydın Buluç (Lawrence Berkeley Lab), Emmanuel Candes (Stanford), Amit Chakrabarti (Dartmouth), Jim Demmel (UC Berkeley), Petros Drineas (Rensselaer Polytechnic Institute), Noureddine El Karoui (UC Berkeley), Michael Friedlander (University of British Columbia), David Gleich (Purdue), Alex Gray (Georgia Tech), Valerie King (University of Victoria), Jian Li (Tsinghua University), Andrew McGregor (University of Massachusetts, Amherst), Jennifer Neville (Purdue), Robert Nowak (University of Wisconsin), Ely Porat (Bar-Ilan University), Yuval Rabani (Hebrew University, Jerusalem), Chris Ré (University of Wisconsin), Ben Recht (University of Wisconsin), Peter Richtarik (University of Edinburgh), Richard Samworth (University of Cambridge), Leonard Schulman (Caltech), Daniel Štefankovič (Univ of Rochester), Mario Szegedy (Rutgers University), David Tse (UC Berkeley), Joel Tropp (Caltech), Suresh Venkatasubramanian (University of Utah), Martin Wainwright (UC Berkeley), Bin Yu (UC Berkeley).
Leonid Barenboim (Ben Gurion University; joint with I-CORE, Israel), Xi Chen (CMU), Moritz Hardt (IBM Almaden), Martin Jaggi (Ecole Polytechnique), Mladen Kolar (CMU), Yi Li (University of Michigan; joint with MPI Saarbrücken), Han Liu (Princeton), Sang-Yun Oh (Stanford; LBNL postdoc), Eric Price (MIT), Or Sheffet (CMU), Nikhil Srivastava (Microsoft Research India postdoc), Justin Thaler (Harvard), Caroline Uhler (IST Austria)
During the semester there will be four workshops spanning the topics of the program, as well as a Boot Camp designed to acquaint participants with key material early in the semester. These are planned as follows:
- Boot Camp : September 3-6, 2013.
Organizers: Michael Jordan (UC Berkeley).
- Workshop 1 : "Succinct Data Representations and Applications." September 16-20, 2013.
Organizers: Petros Drineas (Rensselaer Polytechnic Institute; chair), Francis Bach (INRIA & ENS Paris), Peter Bühlmann (ETH Zürich), Emmanuel Candes (Stanford), Piotr Indyk (MIT), Ravi Kannan (Microsoft Research), S. Muthu Muthukrishnan (Rutgers University and Microsoft Research), Robert Nowak (University of Wisconsin), Stephen Wright (University of Wisconsin).
- Workshop 2: "Parallel and Distributed Algorithms for Inference and Optimization." October 21-25, 2013.
Organizers: Michael Mahoney (Stanford; chair), Guy Blelloch (CMU), John Gilbert (UC Santa Barbara), Chris Ré (University of Wisconsin), Martin Wainwright (UC Berkeley).
- Workshop 3 : "Unifying Theory and Experiment for Large-Scale Networks." November 18-21, 2013.
Organizers: Michael Kearns (University of Pennsylvania; chair), Deepak Agarwal (LinkedIn), Edo Airoldi (Harvard), Ashish Goel(Stanford), Matt Jackson (Stanford), Jennifer Neville (Purdue).
- Workshop 4 : "Big Data and Differential Privacy." December 11-14, 2013.
Organizers: Kunal Talwar (Microsoft Research; chair), Avrim Blum (CMU), Kamalika Chaudhuri (UC San Diego), Cynthia Dwork (Microsoft Research), Michael Jordan (UC Berkeley).
Those interested in participating in this program should send email to the organizers at this address.