Monday, May 2nd, 2016
Model a network as an edge-weighted graph, where the (unknown) weight $w_e$ of edge $e$ indicates the frequency of observed interactions, that is over time $t$ we observe a Poisson($t w_e$) number of interactions across edges $e$. How should we estimate some given statistic of the underlying network? This leads to wide-ranging and challenging problems.
Approximate Message Passing (AMP) is a class of algorithms motivated by statistical physics, that can be applied to a broad family of problems on dense matrices. Remarkably AMP can be analyzed exactly in the large size limit. I will describe the general approach, and illustrate it through applications to specific problems: the hidden clique problem, sparse principal component analysis, non-negative principal component analysis.
I will discuss some recent breakthroughs in the probability literature on the problem of counting sparse graphs with prescribed properties, such as containing more than a given number of triangles. For dense graphs, such counting is possible using Szemeredi's lemma, but this tool is not available for sparse graphs. A new technique, called nonlinear large deviation theory, was introduced two years ago and has led to the resolutions of several open questions of this type. I will talk about the central aspects of this theory, some applications, the inadequacies of the theory in its current form and possibilities for further developments.
Tuesday, May 3rd, 2016
Given a noisy measurement of a product of two matrices, the matrix factorization problem is to estimate back the original matrices. It arises in many applications such as dictionary learning, blind matrix calibration, sparse principal component analysis, blind source separation, low rank matrix completion, robust principal component analysis or factor analysis. We use the tools of statistical mechanics - the cavity and replica methods - to analyze the achievability and tractability of the inference problems in the setting of Bayes-optimal inference. Joint work with Lenka Zdeborova and Thibault Lesieur.
Over the past decade, a rich interaction has grown up between statistical physics and statistical inference. In particular, physics often predicts phase transitions where, if a data set becomes too sparse or too noisy, it suddenly becomes impossible to find its underlying structure, or even to distinguish it from a “null model” with no structure at all. For instance, in the community detection problem in networks, there is a transition below which the vertices cannot be labeled better than chance, and the network cannot be distinguished from an Erdos-Renyi random graph. Similarly, in the clustering problem in Gaussian mixture models, it becomes impossible to find the clusters, or even to tell whether clusters exist, i.e., to distinguish the data from a single Gaussian.
Many of the physics predictions have been made rigorous, but there is still a very lively interchange between the fields, both about these transitions and about asymptotically optimal algorithms. In particular, while efficient message-passing and spectral algorithms are known that succeed down to the so-called Kesten-Stigum bound, in a number of problems we believe that it is information-theoretically possible (but perhaps not computationally feasible) to go further. I will give a friendly introduction to this field, and present some new results proving this conjecture.
This is joint work with Jess Banks, Joe Neeman, Praneeth Netrapalli, Roman Vershynin, and Jiaming Xu.
This talk will outline a recent set of ideas on using spatially coupled ensembles to deduce properties of the underlying non-coupled ensemble. An application is a complete proof of the replica symmetric formula for conditional entropy of regular Low-Density-Parity-Check code ensembles on arbitrary binary input memoryless channels, as well as a proof of the Maxwell area construction for such systems. Applications to other problems and phase transitions will be discussed time permitting.
Wednesday, May 4th, 2016
Community detection consists in identification of groups of similar items within a population. In the context of online social networks, it is a useful primitive for recommending either contacts or news items to users. We will consider a particular generative probabilistic model for the observations, namely the so-called stochastic block model and prove that the non-backtracking operator provides a significant improvement when used for spectral clustering.
Thursday, May 5th, 2016
Bootstrap percolation on a graph $G$ with infection threshold $r$ is a deterministic infection process which evolves in rounds. In each round, every vertex has exactly one of two possible states, either infected or uninfected, and every uninfected vertex $v$ becomes infected if it has at least $r$ infected neighbours, otherwise it remains uninfected. Once a vertex has become infected it remains infected forever. In this talk we shall consider the case when the graph $G$ is a random graph (e.g. binomial random graph $G(n,p)$, random hypergraphs, inhomogeneous random graphs) and show that there exists a critical density of initially infected vertices, below which only a few additional vertices are infected, while above which almost every vertex becomes infected.
We consider the following activation process in undirected graphs: a vertex is active either if it belongs to a set of initially activated vertices or if at some point it has at least r active neighbors. A contagious set is a set whose activation results with the entire graph being active. Given a graph G, let m(G,r) be the minimal size of a contagious set. This process (commonly referred to as bootstrap percolation) has been studied in several fields including statistical physics, computer science, and combinatorics.
We study this process in the binomial random graph G:=G(n,p). Assuming r>1 to be a fixed constant, we determine the asymptotic value of m(G,r) (holding with high probability). Our proof is constructive in the sense that it entails a polynomial algorithm for finding nearly optimal contagious sets.
We also determine the asymptotic threshold probability for G(n,p) to satisfy m(G,r)=r.
Based on joint work with Uriel Feige (Weizmann) and Michael Krivelevich (Tel Aviv).
Determining the chromatic number of random graphs is one of the longest-standing challenges in probabilistic combinatorics. For the Erdös-Rényi model, the single most intensely studied model in the random graphs literature, the question dates back to the seminal 1960 paper that started the theory of random graphs. Apart from that, the model that has received the most attention certainly is the random regular graph. We provide an almost complete solution to the chromatic number problem on the random d-regular graph on n vertices where d remains fixed as n tends to infinity. For every integer k exceeding a certain constant k_0 we identify a number d(k) such that the random d-regular graph is k-colorable with probability tending to 1 as n tends to infinity if d<d(k) and non-k-colorable with probability tending to 1 as n tends to infinity if d>d(k).
This is joint work with Amin Coja-Oghlan and Charilaos Efthymiou.
The well-known Janson's inequality gives Poisson-like upper bounds for the lower tail probability Pr(X \le (1- eps) E X ) when X is the sum of dependent indicator random variables of a special form. In joint work with Svante Janson we show that, for large deviations, this inequality is optimal whenever X is approximately Poisson, i.e., when the dependencies are weak. For subgraph counts in random graphs, this, e.g., yields new lower tail estimates, extending earlier work (for the special case eps=1) of Janson, Luczak and Rucinski. If time permits, we shall discuss a correlation-inequality based ingredient of the proof, which is closely related to the question of calculating conditional expectations and variances.
Friday, May 6th, 2016
We introduce factorization of certain randomly generated matrices as a probabilistic model for unsupervised feature learning. Using the replica method we obtain an exact closed formula for the minimum mean-squared error and sample complexity for reconstruction of the features from data. An alternative way to derive this formula is via state evolution of the associated approximate message passing. Analyzing the fixed points of this formula we identify a gap between computationally tractable and information theoretically optimal inference, associated to a first order phase transition. The model can serve for benchmarking generic-purpose feature learning algorithms, its analysis provides insight into theoretical understanding of unsupervised learning from data.
The injective norm of a k-tensor is a notoriously difficult quantity to approximate. For example, given a random 4-tensors, spectral techniques yield the best known sqrt{n}-approximation to the injective norm. In this talk, I will describe how n^{\epsilon}-degree SoS SDP hierarchy can be used to improve this approximation to a n^{1/2 -\delta} for some \delta.