Fall 2018

Foundations of Data Science

Aug. 15Dec. 14, 2018
Data arising from experimental, observational, and simulational processes in the natural and social sciences, as well as in industrial applications and other domains, have created enormous opportunities for understanding the world we live in.  The pursuit of such understanding requires the development of systems and techniques for processing and analyzing data, falling under the general term “Data Science."  Data Science is a blend of old and new.  Much of the "old" involves ideas and techniques that have been developed in existing methodological and application domains, and much of the "new" is being developed in response to new technologies that create enormous quantities of data.

This program will bring together researchers working on algorithmic, mathematical and statistical aspects of modern Data Science, with the aim of identifying a set of core techniques and principles that form a foundation for the subject.  While the foundations of Data Science lie at the intersection between computer science, statistics and applied mathematics, each of those disciplines in turn developed in response to particular long-standing problems.  Building a foundation for modern Data Science requires rethinking not only how those three research areas interact with data, implementations and applications, but also how each of the areas interacts with the others.  For example, differing applications in computer science and scientific computing have led to different formalizations of appropriate models, questions to consider, computational environments (such as single machine vs distributed data centers vs supercomputers), and so on.  Similarly, business, internet and social media applications tend to have certain design requirements and to generate certain types of questions, and these tend to be very different from those that arise in scientific and medical applications.  As well as these differences, there are also many similarities between these areas.  Developing the theoretical foundations of Data Science requires paying appropriate attention to the questions and issues of domain scientists who generate and use the data, and to the computational environments and platforms supporting this work.  
Our emphasis will be on such topics as dimensionality reduction, randomized numerical linear algebra, optimization, probability in high dimensions, sparse recovery, statistics, including inference and causality, streaming and sublinear algorithms, as well as a variety of application areas that can benefit from these fields and other techniques for processing massive data sets.  Each of these related areas has received attention from a diverse set of research communities, and an important goal for us will be to explore and strengthen connections between methods and problems in these areas, to discover new perspectives on old problems, and to foster interactions between different research communities that address similar problems from quite different perspectives.

This program is supported in part by the Kavli Foundation.


David Woodruff (Carnegie Mellon University; chair), Kenneth Clarkson (IBM Research), Ravindran Kannan (Microsoft Research India), Michael Mahoney (International Computer Science Institute and UC Berkeley), Andrea Montanari (Stanford University), Santosh Vempala (Georgia Tech), Rachel Ward (University of Texas, Austin)

Long-Term Participants:
Anil Ananthaswamy (Journalist), Ery Arias-Castro (UC San Diego), Laura Balzano (University of Michigan), Peter Bartlett (UC Berkeley), Shai Ben-David (University of Waterloo), Vladimir Braverman (Johns Hopkins University), Amit Chakrabarti (Dartmouth College), Ken Clarkson (IBM Almaden), Artur Czumaj (University of Warwick), Anirban Dasgupta (IIT Gandhinagar), Sanjoy Dasgupta (UC San Diego), Ilias Diakonikolas (University of Southern California), Maryam Fazel (University of Washington), Anupam Gupta (Carnegie Mellon University), Mohammad Hajiaghayi (University of Maryland), Moritz Hardt (UC Berkeley), Adel Javanmard (USC), T.S. Jayram (IBM Almaden), Jiantao Jiao (UC Berkeley), Mike Jordan (UC Berkeley), Brendan Juba (Washington University in St. Louis), Ravi Kannan (Microsoft Research India), Michael Kapralov (Ecole Polytechnique Federale de Lausanne), Robi Krauthgamer (Weizmann Institute of Science), Lin Lin (UC Berkeley), Gabor Lugosi (Pompeu Fabra University), Michael Mahoney (International Computer Science Institute and UC Berkeley), Dustin Mixon (Ohio State University), Andrea Montanari (Stanford University), Elchanan Mossel (Massachusetts Institute of Technology), Sayan Mukherjee (Duke University), Boaz Nadler (Weizmann Institute of Science), Deanna Needell (University of California, Los Angeles), Rasmus Pagh (IT University of Copenhagen), Jeff Phillips (University of Utah), Eric Price (University of Texas at Austin), Sofya Raskhodnikova (Boston University), Ben Recht (UC Berkeley), Fred Roosta (University of Queensland), Barna Saha (University of Massachusetts Amherst), Sujay Sanghavi (University of Texas at Austin), Michael Saunders (Stanford University), Nikhil Srivastava (UC Berkeley), Madeleine Udell (Cornell University), Santosh Vempala (Georgia Institute of Technology), Martin Wainwright (UC Berkeley), Bei Wang (University of Utah), Rachel Ward (University of Texas at Austin), David Woodruff (Carnegie Mellon University), Bin Yu (UC Berkeley)

Research Fellows:
Michal Derezinski (UC Santa Cruz), Jelena Diakonikolas (Boston University), Gautam Kamath (Massachusetts Institute of Technology), Rajiv Khanna (University of Texas at Austin), Jerry Li (Massachusetts Institute of Technology), Marco Mondelli (Stanford University), Yan Shuo Tan (University of Michigan)

Visiting Graduate Students and Postdocs:
Ainesh Bakshi (Carnegie Mellon University), Soheil Behnezhad (University of Maryland), Hadley Black (University of California, Santa Cruz), Mahsa Derakhshan (University of Maryland), Charlie Dickens (University of Warwick), Simon Du (Carnegie Mellon University), Alireza Farhadi (University of Maryland), Mohammad Amin Ghiasi (University of Maryland), Amir Gholaminejad (UC Berkeley), Wooseok Ha (UC Berkeley), Rajesh Jayaram (Carnegie Mellon University), John Kallaugher (University of Texas at Austin), Francois Lanusse (UC Berkeley), Jason Li (Carnegie Mellon University), Yian Ma (UC Berkeley), Sourabh Palande (University of Utah), Hamed Saleh (University of Maryland), Saeed Seddighin (University of Maryland), Fei Shi (Carnegie Mellon University), Zhao Song (University of Texas at Austin), Anastasia Voloshinov (USC), Ruosong Wang (Carnegie Mellon University), Xiaoxia Wu (University of Texas at Austin), Hongyang Zhang (Carnegie Mellon University)

sympa [at] lists [dot] simons [dot] berkeley [dot] edu (body: (Click here to subscribe to our announcements email list for this program).


Aug. 27Aug. 31, 2018


Ken Clarkson (IBM Almaden), Ravi Kannan (Microsoft Research India), Michael Mahoney (International Computer Science Institute and UC Berkeley), Andrea Montanari (Stanford University), Santosh Vempala (Georgia Institute of Technology), Rachel Ward (University of Texas at Austin), David Woodruff (IBM Almaden)
Sep. 24Sep. 28, 2018


Petros Drineas (Purdue University; chair), Ken Clarkson (IBM Almaden), Prateek Jain (Microsoft Research India), Michael Mahoney (International Computer Science Institute and UC Berkeley)
Oct. 29Nov. 2, 2018


Andrea Montanari (Stanford University; chair), Emmanuel Candès (Stanford University), Ilias Diakonikolas (University of Southern California), Santosh Vempala (Georgia Institute of Technology)
Nov. 27Nov. 30, 2018


Robert Krauthgamer (Weizmann Institute; chair), Artur Czumaj (University of Warwick), Aarti Singh (Carnegie Mellon University), Rachel Ward (University of Texas at Austin)

Those interested in participating in this program should send an email to the organizers at this datascience2018 [at] lists [dot] simons [dot] berkeley [dot] edu (at this address.)

Program image by Luisa Lee