Playlist: 21 videos

Statistics in the Big Data Era

Remote video URL
0:42:10
Jingyi Jessica Li (UCLA)
https://simons.berkeley.edu/talks/enhancing-statistical-rigor-genomic-data-science
Statistics in the Big Data Era

"The rapid development of genomics technologies has propelled fast advances in genomics data science. While new computational algorithms have been continuously developed to address cutting-edge biomedical questions, a critical but largely overlooked aspect is the statistical rigor. In this talk, I will introduce our recent work that aims to enhance the statistical rigor by addressing three issues: 1. large-scale feature screening (i.e., enrichment and differential analysis of high-throughput data) relying on ill-posed p-values; 2. double-dipping (i.e., statistical inference on biasedly altered data); 3. gaps between black-box generative models and statistical inference."
Visit talk page
Remote video URL
0:40:30
Sharmodeep Bhattacharyya (Oregon State University)
https://simons.berkeley.edu/talks/sketched-wasserstein-distance
Statistics in the Big Data Era

Statistical analysis of networks generated from exchangeable network models have been extensively studied in the literature. One primary property of exchangeable network models is the conditional independence of edge formation. In this work, we extend the framework of network formation to include dependent edges with emphasis on generating networks with all five properties of sparsity, small-world, community structure, power-law degree distribution, and transitivity or high triangle count. We propose a class of models, called as Transitive Inhomogeneous Erdos-Renyi (TIER) models, which we show has all five properties. We also perform inferential tasks, such as, parameter estimation, community detection, and change-point detection using networks generated from TIER models. We validate our results using simulation studies too. If time permits, we would talk about some recent developments on estimation of number of communities using Bethe Hessian matrices.
Visit talk page
Remote video URL
0:30:55
Ben Brown (Lawrence Berkeley National Laboratory)
https://simons.berkeley.edu/talks/tbd-425
Statistics in the Big Data Era
Visit talk page
Remote video URL
1:0:20
Panel featuring Peter Bickel (UC Berkeley), Peter Buhlmann (ETH), Jianqing Fan (Princeton), Jon McAuliffe (Voleon/UC Berkeley), Deb Nolan (UC Berkeley), Tao Shi (Citadel), Bin Yu (UC Berkeley); Liza Levina (University of Michigan; moderator).
https://simons.berkeley.edu/talks/discussion-panel-statistics-big-data-era
Statistics in the Big Data Era
Visit talk page
Remote video URL
0:35:40
Purnamrita Sarkar (University of Texas at Austin)
https://simons.berkeley.edu/talks/bootstrapping-error-ojas-algorithm
Statistics in the Big Data Era

We consider the problem of quantifying uncertainty for the estimation error of the leading eigenvector from Oja's algorithm for streaming principal component analysis, where the data are generated IID from some unknown distribution. By combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and concentration of matrix products, we establish a weighted χ2 approximation result for the sin2 error between the population eigenvector and the output of Oja’s algorithm. Since estimating the covariance matrix associated with the approximating distribution requires knowledge of unknown model parameters, we propose a multiplier bootstrap algorithm that may be updated in an online manner. We establish conditions under which the bootstrap distribution is close to the corresponding sampling distribution with high probability, thereby establishing the bootstrap as a consistent inferential method in an appropriate asymptotic regime.
Visit talk page
Remote video URL
0:41:45
Richard Samworth (University of Cambridge)
https://simons.berkeley.edu/talks/optimal-nonparametric-testing-missing-completely-random-and-its-connections-compatibility
Statistics in the Big Data Era

Given a set of incomplete observations, we study the nonparametric problem of testing whether data are Missing Completely At Random (MCAR). Our first contribution is to characterise precisely the set of alternatives that can be distinguished from the MCAR null hypothesis. This reveals interesting and novel links to the theory of Fr\'echet classes (in particular, compatible distributions) and linear programming, and we leverage tools developed in these fields to propose MCAR tests that are consistent against all detectable alternatives. Moreover, we define a natural measure of ease of detectability (an incompatibility index), and exploit ideas from max-flow min-cut theory to prove that our tests achieve the optimal minimax separation rate according to this measure in certain cases.
Visit talk page
Remote video URL
0:40:55
Lihua Lei (Stanford University)
https://simons.berkeley.edu/talks/cyclic-permutation-test-nonstandard-exact-test-linear-models
Statistics in the Big Data Era

We propose the Cyclic Permutation Test (CPT) to test general linear hypotheses for linear models. This test is non-randomized and valid in finite samples with exact Type I error α for an arbitrary fixed design matrix and arbitrary exchangeable errors, whenever 1/α is an integer and n/p≥1/α−1. The test applies the marginal rank test to 1/α linear statistics of the outcome vector, where the coefficient vectors are determined by solving a linear system such that the joint distribution of the linear statistics is invariant with respect to a nonstandard cyclic permutation group under the null hypothesis. The power can be further enhanced by solving a secondary non-linear traveling salesman problem, for which the genetic algorithm can find a reasonably good solution. Extensive simulation studies show that the CPT has comparable power to existing tests. When testing for a single contrast of coefficients, an exact confidence interval can be obtained by inverting the test. Furthermore, we provide a selective yet extensive literature review of the century-long efforts on this problem from 1908 to 2018, highlighting the novelty of our test. This is a joint work with Peter Bickel.
Visit talk page
Remote video URL
0:48:20
Yazhen Wang (University of Wisconsin, Madison)
https://simons.berkeley.edu/talks/statistics-quantum-computational-supremacy
Statistics in the Big Data Era

Quantum computation and quantum information have attracted considerable attention on multiple frontiers of scientific fields ranging from physics to chemistry and engineering, as well as from computer science to mathematics and statistics. It has been theoretically proved that quantum algorithms may provide for significant speedups over classical algorithms, but technological hurdles must be overcome to construct large-scale quantum computers for executing the quantum algorithms. Quantum (computational) supremacy refers to any major milestone quantum computing achievement in the quest for outperforming classical computers on some tough computational tasks. This talk will present recent quantum supremacy studies for solving hard statistical sampling problems and show that statistics plays a major role in the quantum supremacy studies.
Visit talk page
Remote video URL
0:43:25
David Donoho (Stanford University)
https://simons.berkeley.edu/talks/tbd-426
Statistics in the Big Data Era

Peter Bickel has been a world-beating researcher at a world-beating department in a world-beating research university for roughly sixty years.

I myself have cited in my own published work contributions that Bickel has made across each of the decades 1960-,1970-,1980-,1990-,2000- and 2010-; and I hope to soon cite his most recently-appeared (2021) work; this makes 7 decades of Peter's explicit influence on my own intellectual development. While I wouldn't be able to do justice to the massive oeuvre he has produced, I will try to discuss some of his contributions and the consequences in the research areas where I do have some expertise; these include robustness, high-dimensional linear models, compressed sensing, and high-dimensional covariance estimation.
Visit talk page
Remote video URL
0:8:45
Peter Bickel & Haiyan Huang (UC Berkeley)
https://simons.berkeley.edu/talks/poster-awards-closing-remarks
Statistics in the Big Data Era
Visit talk page