Monday, May 2nd, 2016

9:30 am10:15 am
Ideas from statistical physics provide a detailed description of phase transitions and properties for a wide range of random constraint satisfaction problems.  Increasingly many of these heuristics have been established mathematically as well. I will discuss new results in the condensation regime where these models undergo a one-step replica symmetry breaking transition. 
 
This is joint work with Nike Sun and Yumeng Zhang.
 
10:30 am11:15 am

Model a network as an edge-weighted graph, where the (unknown) weight $w_e$ of edge $e$ indicates the frequency of observed interactions, that is over time $t$ we observe a Poisson($t w_e$) number of interactions across edges $e$.  How should we estimate some given statistic of the underlying network?  This leads to wide-ranging and challenging problems.

 
11:45 am12:30 pm
For a large class of random constraint satisfaction problems, heuristic methods from statistical physics yield detailed predictions on phase transitions. I will survey some of these heuristics, and discuss approaches for making them rigorous. This talk is based on joint works with Jian Ding, Allan Sly, and Yumeng Zhang.
2:00 pm2:45 pm

Approximate Message Passing (AMP) is a class of algorithms motivated by statistical physics, that can be applied to a broad family of problems on dense matrices. Remarkably AMP can be analyzed exactly in the large size limit. I will describe the general approach, and illustrate it through applications to specific problems: the hidden clique problem, sparse principal component analysis, non-negative principal component analysis.

 

2:45 pm3:30 pm

I will discuss some recent breakthroughs in the probability literature on the problem of counting sparse graphs with prescribed properties, such as containing more than a given number of triangles. For dense graphs, such counting is possible using Szemeredi's lemma, but this tool is not available for sparse graphs. A new technique, called nonlinear large deviation theory, was introduced two years ago and has led to the resolutions of several open questions of this type. I will talk about the central aspects of this theory, some applications, the inadequacies of the theory in its current form and possibilities for further developments.

4:00 pm4:45 pm
The Max-Cut problem seeks to determine the maximal cut size in a given graph. With no polynomial-time efficient approximation for Max-Cut (unless P=NP), its asymptotic for a typical large sparse graph is of considerable interest. We prove that for uniformly random d-regular graph of N vertices, and for the uniformly chosen Erdos-Renyi graph of M=N d/2 edges, the leading correction to M/2 (the typical cut size), is P_* sqrt(N M/2). Here P_* is the ground state energy of the Sherrington-Kirkpatrick model, expressed analytically via Parisi's formula.
 
This talk is based on a joint work with Subhabrata Sen and Andrea Montanari.

Tuesday, May 3rd, 2016

9:30 am10:15 am
The problem of finding low-rank decompositions of tensors has a wide-range applications, e.g., for learning various mixture models. In the overcomplete regime---when the rank is superlinear in the dimension---most known decomposition methods for third-order tensors break down.
 
In this talk, we describe a polynomial-time algorithm based on the sum-of-squares method that is able to approximately decompose random n-dimensional third-order tensors of rank up to n^{3/2}/polylog n. The previous best algorithm by Ge and Ma also used sum-of-squares but required quasi-polynomial time to achieve such decompositions.
 
Finally, we demonstrate how to reap the benefits of the sum-of-squares method without solving large semidefinite programs: Using the sum-of-squares toolkit, we devise new kinds of spectral algorithms that achieve similar decomposition guarantees as sum-of-squares but with running times that are close to linear in the size of the input (exponent 1.08 using fast matrix-multiplication).
 
Based on joint works with Sam Hopkins, Tengyu Ma, Tselil Schramm, and Jonathan Shi.
10:30 am11:15 am

Given a noisy measurement of a product of two matrices, the matrix factorization problem is to estimate back the original matrices. It arises in many applications such as dictionary learning, blind matrix calibration, sparse principal component analysis, blind source separation, low rank matrix completion, robust principal component analysis or factor analysis. We use the tools of statistical mechanics - the cavity and replica methods - to analyze the achievability and tractability of the inference problems in the setting of Bayes-optimal inference. Joint work with Lenka Zdeborova and Thibault Lesieur.

11:45 am12:30 pm
We will discuss the role that subdominant states play in the design of algorithms for large-scale  learning problems. We shall take as representative case the problem of learning random patterns with discrete  synapses in feedforward networks. The standard statistical physics results show that this problem is exponentially dominated by metastable states and by isolated solutions that are extremely hard to find algorithmically.  
 
A large deviation analysis  based on a  local entropy  measure allows us to  find analytical evidence for the existence of subdominant and extremely dense regions of solutions.  We show numerically that these dense regions are surprisingly accessible by simple learning protocols. The large deviation measure introduced for the analytic study can be used as an objective function for  a simple Monte Carlo Markov Chain which finds solutions efficiently.
2:45 pm3:30 pm

Over the past decade, a rich interaction has grown up between statistical physics and statistical inference.  In particular, physics often predicts phase transitions where, if a data set becomes too sparse or too noisy, it suddenly becomes impossible to find its underlying structure, or even to distinguish it from a “null model” with no structure at all.  For instance, in the community detection problem in networks, there is a transition below which the vertices cannot be labeled better than chance, and the network cannot be distinguished from an Erdos-Renyi random graph.  Similarly, in the clustering problem in Gaussian mixture models, it becomes impossible to find the clusters, or even to tell whether clusters exist, i.e., to distinguish the data from a single Gaussian.

Many of the physics predictions have been made rigorous, but there is still a very lively interchange between the fields, both about these transitions and about asymptotically optimal algorithms.  In particular, while efficient message-passing and spectral algorithms are known that succeed down to the so-called Kesten-Stigum bound, in a number of problems we believe that it is information-theoretically possible (but perhaps not computationally feasible) to go further.  I will give a friendly introduction to this field, and present some new results proving this conjecture.

This is joint work with Jess Banks, Joe Neeman, Praneeth Netrapalli, Roman Vershynin, and Jiaming Xu.

4:00 pm4:45 pm

This talk will outline a recent set of ideas on using spatially coupled ensembles to deduce properties of the underlying non-coupled ensemble. An application is a complete proof of the replica symmetric formula for conditional entropy of regular Low-Density-Parity-Check code ensembles on arbitrary binary input memoryless channels, as well as a proof of the Maxwell area construction for such systems. Applications to other problems and phase transitions will be discussed time permitting.

4:45 pm5:30 pm
We study the hard-core (gas) model defined on independent sets  of an input graph where the independent sets are weighted by a parameter (aka fugacity) \lambda>0.  For constant \Delta, previous work of Weitz (2006)  established an FPTAS for the partition function for graphs of maximum degree \Delta when \lambda<\lambda_c(\Delta). The threshold \lambda_c(\Delta) is the critical point for the statistical physics phase transition for uniqueness/non-uniqueness on the infinite \Delta-regular trees.
 
Sly (2010) showed that there is no FPRAS, unless NP=RP, when \lambda>\lambda_c(\Delta). The running time of Weitz's algorithm is exponential in  \log{\Delta}. Here we present an FPRAS for the partition function whose running time is O^*(n^2). We analyze the simple single-site Markov chain known as the Glauber dynamics for sampling from the associated Gibbs distribution. We prove there exists a constant \Delta_0 such that for all graphs with maximum degree \Delta\geq\Delta_0  and girth \geq 7 (i.e., no cycles of length \leq 6), the mixing time of the Glauber dynamics is O(n\log{n}) when \lambda<\lambda_c(\Delta). Our work complements that of Weitz which applies for small constant \Delta whereas our work applies for all \Delta  at least a sufficiently large constant \Delta_0  (this includes \Delta depending on n=|V|).
 
Our proof utilizes loopy BP (belief propagation) which is a widely-used algorithm for inference in graphical models. A novel aspect of our work is  using the principal eigenvector for the BP operator to design a  distance function which contracts in expectation for pairs of states that behave like the BP fixed point. We also prove that the Glauber dynamics behaves locally like loopy BP.  As a byproduct we obtain that the Glauber dynamics converges, after a short burn-in period, close to the BP fixed point, and this implies that the fixed point of loopy BP is a close approximation to the Gibbs distribution. Using these connections we establish that loopy BP quickly converges to the Gibbs distribution  when the girth \geq 6 and \lambda<\lambda_c(\Delta)

Wednesday, May 4th, 2016

9:30 am10:15 am
Planted clique is a basic problem in average-case analysis that has found numerous applications, including in discovering motifs in biological networks, computing the best Nash equilibrium, property testing, sparse PCA, compressed sensing, cryptography and mathematical finance. In this work, we prove a nearly optimal lower bound against the Sum-of-Squares hierarchy for it. Previously, it was possible that the Sum-of-Squares hierarchy might be able to find planted cliques of size $n^{\epsilon}$ for any $\epsilon > 0$ in polynomial time. We prove that the degree d program has an integrality gap of at least $n^{1/2 - c \sqrt{d / log n}}$ for some constant $c > 0$.
 
Moreover our approach is based on a new framework that we call pseudo-calibration, which yields a  general recipe for constructing good pseudo-distributions (i.e., dual certificates for the Sum-of-Squares semidefinite program). This framework also sheds further light on the ways in which the Sum-of-Squares hierarchy is qualitatively stronger than others, and what is needed to fool it.
 
Joint work with Boaz Barak, Sam Hopkins, Jon Kelner, Pravesh Kothari and Aaron Potechin
10:30 am11:15 am

Community detection consists in identification of groups of similar items within a population. In the context of online social networks, it is a useful primitive for recommending either contacts or news items to users. We will consider a particular generative probabilistic model for the observations, namely the so-called stochastic block model and prove that the non-backtracking operator provides a significant improvement when used for spectral clustering.

11:45 am12:30 pm
Statistical inference problems arising within signal processing, data mining, and machine learning naturally give rise to hard combinatorial optimization problems. These problems become intractable when the dimensionality of the data is large, as is often the case for modern datasets. A popular idea is to construct convex relaxations of these combinatorial problems, which can be solved efficiently for large scale datasets. 
Semidefinite programming (SDP) relaxations are among the most powerful methods in this family, and are surprisingly well-suited for a broad range of problems where data take the form of matrices or graphs.
 
We have recently shown [Javamard, Montanari, Ricci-Tersenghi, arxiv:1511.08769] how to compute detection thresholds for SDP relaxations and the corresponding estimation error by mapping the problem to statistical mechanics models with vector spins in the limit of a large number of spin components.
We have studied some classical SDP relaxations for statistical problems motivated by graph synchronization and community detection in networks.
Our results clarify the effectiveness of SDP relaxations in solving high-dimensional statistical problems.
 
I will dedicate a particular attention to the problem of community detection in sparse graphs for which we have shown SDP relaxation to be quasi optimal. A robust and fast algorithm inspired by the SDP relaxation is presented, which is able to detect communities in very large graphs. The code is publicly available at http://web.stanford.edu/~montanar/SDPgraph/

Thursday, May 5th, 2016

9:30 am10:15 am
I will present the problem of shotgun assembly of random graphs and report some recent progress. Based on joint works with Ross, Sun and Feige.
10:30 am11:15 am

Bootstrap percolation on a graph $G$ with infection threshold $r$ is a deterministic infection process which evolves in rounds. In each round, every vertex has exactly one of two possible states, either infected or uninfected, and every uninfected vertex $v$ becomes infected if it has at least $r$ infected neighbours, otherwise it remains uninfected. Once a vertex has become infected it remains infected forever. In this talk we shall consider the case when the graph $G$ is a random graph (e.g. binomial random graph $G(n,p)$, random hypergraphs, inhomogeneous random graphs) and show that there exists a critical density of initially infected vertices, below which only a few additional vertices are infected, while above which almost every vertex becomes infected.

11:45 am12:30 pm

We consider the following activation process in undirected graphs: a vertex is active either if it belongs to a set of initially activated vertices or if at some point it has at least r active neighbors. A contagious set is a set whose activation results with the entire graph being active. Given a graph G, let m(G,r) be the minimal size of a contagious set. This process (commonly referred to as bootstrap percolation) has been studied in several fields including statistical physics, computer science, and combinatorics.

We study this process in the binomial random graph G:=G(n,p). Assuming r>1 to be a fixed constant, we determine the asymptotic value of m(G,r) (holding with high probability). Our proof is constructive in the sense that it entails a polynomial algorithm for finding nearly optimal contagious sets.

We also determine the asymptotic threshold probability for G(n,p) to satisfy m(G,r)=r.

Based on joint work with Uriel Feige (Weizmann) and Michael Krivelevich (Tel Aviv).

2:00 pm2:45 pm
Random constraint satisfaction problems undergo several phase transitions as the ratio between the number of constraints and the number of variables is varied. When this ratio exceeds the satisfiability threshold no more solutions exist; the satisfiable phase, for less constrained problems, is itself divided in an unclustered regime and a clustered one. In the latter solutions are grouped in clusters of nearby solutions separated in configuration space from solutions of other clusters. In addition the rigidity transition signals the appearance of so-called frozen variables in typical solutions: beyond this threshold most solutions belong to clusters with an extensive number of variables taking the same values in all solutions of the cluster. In this talk I will present recent results that refine the description of this phenomenon by estimating the location of the freezing transition, corresponding to the disappearance of all unfrozen solutions (not only typical ones). From a technical point of view we have characterized atypical solutions with a number of frozen variables different from the typical value via a large deviation study of the dynamics of a stripping process (whitening) that unveils the frozen variables of a solution, building upon recent works on atypical trajectories of the bootstrap percolation dynamics.
 
Joint work with Alfredo Braunstein, Luca Dall'Asta and Lenka Zdeborova
2:45 pm3:30 pm

Determining the chromatic number of random graphs is one of the longest-standing challenges in probabilistic combinatorics. For the Erdös-Rényi model, the single most intensely studied model in the random graphs literature, the question dates back to the seminal 1960 paper that started the theory of random graphs. Apart from that, the model that has received the most attention certainly is the random regular graph. We provide an almost complete solution to the chromatic number problem on the random d-regular graph on n vertices where d remains fixed as n tends to infinity. For every integer k exceeding a certain constant k_0 we identify a number d(k) such that the random d-regular graph is k-colorable with probability tending to 1 as n tends to infinity if d<d(k) and non-k-colorable with probability tending to 1 as n tends to infinity if d>d(k).
This is joint work with Amin Coja-Oghlan and Charilaos Efthymiou.

4:00 pm4:45 pm

The well-known Janson's inequality gives Poisson-like upper bounds for the lower tail probability Pr(X \le (1- eps) E X ) when X is the sum of dependent indicator random variables of a special form. In joint work with Svante Janson we show that, for large deviations, this inequality is optimal whenever X is approximately Poisson, i.e., when the dependencies are weak. For subgraph counts in random graphs, this, e.g., yields new lower tail estimates, extending earlier work (for the special case eps=1) of Janson, Luczak and Rucinski. If time permits, we shall discuss a correlation-inequality based ingredient of the proof, which is closely related to the question of calculating conditional expectations and variances.

 

Friday, May 6th, 2016

9:30 am10:15 am
Numerous problems in applied mathematics and statistics require integrating a function f over a high-dimensional domain. For example, estimating the partition function of a graphical model for a fixed set of parameters requires integrating (summing) its unnormalized probability function f over all possible configurations (value assignments). The most common methods for performing such integration stochastically involve Markov Chains (MCMC). 
 
We present an alternative to MCMC based on ideas from the theory of modern error-correcting codes. Specifically, we will see that stochastically integrating a non-negative function f over its entire domain can often be achieved at the cost of merely maximizing f over a few random subsets of its domain. The key lies in specifying subsets with  appropriate statistical properties in a manner than does not dramatically affect the capacity for maximization. Using real-life CNF formulas as benchmarks, our approach of encoding the subsets as cosets of LDPC error-correcting codes allows us to achieve dramatic speedup and levels of accuracy that were previously  completely unattainable.
 
Based on joint work with Pei Jiang.
10:30 am11:15 am

We introduce factorization of certain randomly generated matrices as a probabilistic model for unsupervised feature learning. Using the replica method we obtain an exact closed formula for the minimum mean-squared error and sample complexity for reconstruction of the features from data. An alternative way to derive this formula is via state evolution of the associated approximate message passing. Analyzing the fixed points of this formula we identify a gap between computationally tractable and information theoretically optimal inference, associated to a first order phase transition. The model can serve for benchmarking generic-purpose feature learning algorithms, its analysis provides insight into theoretical understanding of unsupervised learning from data.

11:45 am12:30 pm

The injective norm of a k-tensor is a notoriously difficult quantity to approximate.  For example, given a random 4-tensors, spectral techniques yield the best known sqrt{n}-approximation to the injective norm.  In this talk, I will describe how n^{\epsilon}-degree SoS SDP hierarchy can be used to improve this approximation to a n^{1/2 -\delta} for some \delta.