# Abstracts

### Monday, June 6th, 2016

9:30 am10:00 am

No abstract available.

10:00 am10:30 am
In this work we formalize and address the general problem of data reuse in adaptive data analysis. We show how the differential-privacy based approach given in (Dwork et al., 2014) is applicable much more broadly to adaptive data analysis. We then show that a simple approach based on description length can also be used to give guarantees of statistical validity in adaptive settings. Finally, we demonstrate that these incomparable approaches can be unified via the notion of approximate max-information that we introduce.

Joint work with C. Dwork, M. Hardt, T. Pitassi, O. Reingold and A. Roth
11:00 am11:30 am
Speaker:
We develop an information-theoretic view of the stochastic block model, a popular statistical model for the large-scale structure of complex networks.  A graph $G$ from such a model is generated by first assigning vertex labels at random from a finite alphabet, and then connecting vertices with edge probabilities depending on the labels of the endpoints. In the case of the symmetric two-group model, when the vertex degree is large, we establish an explicit ‘single-letter’ characterization of the asymptotic (per vertex) mutual information between the vertex labels and the graph.

In this talk, I will discuss some of the estimation implications of this characterization. I will also discuss some ingredients of the proof technique: (i) the universality principle from random matrix theory and (ii) a rigorous verification of the replica-symmetric heuristic from spin glass theory.

This is joint work with Emmanuel Abbe and Andrea Montanari.
11:30 am12:00 pm
Emerging long-read sequencing technologies promise to enable near-perfect reconstruction of whole genomes. Assembly of long reads is usually accomplished using a read-overlap graph, in which the true sequence corresponds to a Hamiltonian path. As such, the assembly problem becomes NP-hard under most formulations, and most of the known algorithmic approaches are heuristic in nature. In this talk, however, we will see that the NP-hardness of the computational problem can be overcome if we focus on instances of the assembly problem that are feasible from an information-theoretic standpoint.

Somewhat surprisingly, we arrive at this conclusion by first setting the computational complexity issue aside, and instead seeking algorithms that target information limits. In particular, we begin with a basic feasibility question: when does the set of reads contain enough information to allow unambiguous reconstruction of the true sequence? We show that in most instances of the problem where the reads do contain enough information for assembly, the read-overlap graph can be sparsified, giving rise to an Eulerian graph where the problem can be solved in linear time. We also study the cases where the reads do not contain enough information for assembly, but from a rate-distortion perspective. We propose a new notion of distortion for assembly graphs and describe a practical assembly algorithm whose achieved distortion can be bounded in terms of the repeat complexity of the target genome.

12:00 pm12:15 pm
We consider a linear regression game: at each round, an adversary reveals a covariate vector, the learner predicts a real value, the adversary reveals a label, and the learner incurs the squared prediction error. The aim is to minimize the difference between the cumulative loss and that of the linear predictor that is best in hindsight. During the information theory program, we presented work showing that the minimax optimal strategy is easy to compute under various constraints on the adversary's labels, provided that the sequence of covariate vectors is known in advance. These optimal predictions depend on the full covariate sequence. Here, we present a strategy that can be efficiently computed and show that it is minimax optimal for an adversarially chosen covariate sequence, provided that the sequence satisfies a certain total covariance budget constraint that is known in advance.  This strategy is horizon-independent, that is, no matter how long the game runs, this strategy incurs no more regret than the optimal strategy that knows in advance the number of rounds of the game.

2:00 pm2:30 pm
Bogdanov and Mossel (2011) consider the following problem.

"Suppose Alice receives a string of unbiased independent random bits and Bob receives a noisy copy of the same bits, where each bit is flipped with probability eps < 1/2. Alice and Bob wish to extract a common sequence of k random bits."

We study the relationship between communication and the probability of agreement. Suppose Alice wishes to send Bob a message of delta k bits in order to ensure that their k-bit outputs agree with probability 2^{-gamma k}. How big must delta be as a function of gamma? We show the following:

delta(gamma) >= C (1-gamma) - 2 sqrt{C(1-C) gamma}.

where C = 4 eps (1-eps).

This implies that that for delta(gamma) = 0, we have gamma >= eps/(1-eps), recovering the original result of Bogdanov and Mossel.

In this talk, we will describe  the above trade-off, which is based on the standard hypercontractivity inequality.  We also obtain strategies that show that this trade-off between communication and the probability of error is asymptotically tight.

(This is part of joint work with Venkat Guruswami.)

2:30 pm3:00 pm
In this talk, we will consider  coding schemes for multi-party interactive communication over synchronous networks that suffer from stochastic noise, where each bit is independently flipped with probability ε. We analyze the minimal overhead that must be added by the coding scheme in order to succeed in performing the computation despite the noise. Our main result is a lower bound on the communication of any noise-resilient protocol over a star network with n-parties. Specifically, we show a task that can be solved in T rounds over the noise-free network, but for which any protocol with a success probability of 1 − o(1) requires, at least, Ω(T log n/loglog n) rounds when the channels are noisy. By a 1994 result of Rajagopalan and Schulman, the slowdown we prove is the highest one can obtain on any topology, up to a log log n factor. We complete our lower bound with a matching coding scheme that achieves the same overhead; thus, the capacity of star networks is Θ(log log n/ log n). Our bounds prove that, despite several previous coding schemes with rate Ω(1) for certain topologies, no coding scheme with constant rate Ω(1) exists for arbitrary n-party noisy networks

3:30 pm4:00 pm
We give new lower bounds for multi-party communication problems based on the basic Equality predicate, but incorporating strong promises. These promises make our bounds stronger than what was known before. We also give some applications of these bounds to data stream problems. In particular, we obtain strong space/approximation tradeoffs showing that for deterministic algorithms, even loose estimations of some basic statistics of an input stream require large, even linear, amounts of space.

This work is joint with Sagar Kale (Dartmouth).

4:00 pm4:30 pm
We describe a general method of proving degree lower bounds for conical juntas (nonnegative combinations of conjunctions) that compute recursively defined boolean functions. Such lower bounds are known to carry over to communication complexity. We give two applications:

AND-OR trees: We show a near-optimal Omega~(n^{0.753...}) randomized communication lower bound for the recursive NAND function (a.k.a. AND-OR tree). This answers an open question posed by Beame and Lawry 1992,1993.

Recursive Majority: We show an Omega(2.59^k) randomized communication lower bound for the 3-majority tree of height k. This improves over the state-of-the-art already in the context of randomised decision tree complexity.

Joint with Mika Goos (U. Toronto)

### Tuesday, June 7th, 2016

9:30 am10:30 am
In this talk, we show that a sequence of Reed-Muller codes with increasing length achieves capacity on the binary erasure channel if the sequence of rates converges.  This possibility was first discussed by Dumer and Farrell in 1994 but has been settled so far only for some cases where the limiting code rate is 0 or 1.  In fact, our primary technical result is quite general and also includes extended primitive narrow-sense BCH codes and all other affine-invariant codes.  The method can be seen as a new approach to proving that a sequence of deterministic linear codes achieves capacity on an erasure channel under maximum a posteriori decoding.  Rather than relying on the precise structure of the codes, this method requires only that the codes are highly symmetric. In particular, the technique applies to any sequence of linear codes where the block lengths are strictly increasing, the code rates converge, and the permutation group of each code is doubly transitive. This also provides a rare example in information theory where symmetry alone implies near-optimal performance.  The primary tools used in the proof are the sharp threshold property for symmetric monotone Boolean functions and the area theorem for extrinsic information transfer functions.

This work is joint with Shrinivas Kudekar, Santhosh Kumar, Marco Mondelli, Eren Sasoglu, Ruediger Urbanke.

11:00 am11:30 am
We introduce a mathematical theory of distributed storage.

11:30 am12:00 pm
A fundamental fact about polynomial interpolation is that k evaluations of a degree-(k-1) polynomial f are sufficient to determine f.  This is also necessary in a strong sense: given k-1 evaluations, we learn nothing
about the value of f on any k'th point.  Motivated by the *exact repair problem* for Reed-Solomon codes in distributed storage systems, we study a variant of the polynomial interpolation problem: instead of querying entire evaluations of f (which are elements of a large field F) to recover an unknown evaluation, we are allowed to query only a few bits of evaluations.

We show that in this model, one can do significantly better than in the traditional setting, in terms of the amount of information required to determine the missing evaluation.  More precisely, only O(k) bits are necessary to recover the missing evaluation, and this result is optimal for linear methods.

Our result answers a question about the use of RS codes in distributed storage which was asked by Alex Dimakis during the Simons Institute's IT Program last year.

(Joint work with Venkat Guruswami)

2:00 pm2:30 pm
We will discuss recent progress on code constructions for recovery from worst-case bit deletions. Specifically, we will construct a family of binary codes of positive rate allowing efficient recovery from a fraction of deletions approaching sqrt{2}-1 > 0.414 (though we might focus on a simpler construction for deletion fraction 1/3 for the talk). Previously, even non-constructively the largest deletion fraction known to be correctable with positive rate was around 0.17. For alphabet size k, we construct codes to correct a deletion fraction exceeding (k-1)/(k+1), with (k-1)/k being a trivial upper limit. Whether a deletion fraction approaching 1/2 is correctable by binary codes remains a tantalizing open question. (Joint work with Boris Bukh and Johan Hastad)

2:30 pm3:00 pm
When adjacency of graph vertices represents confusability of code symbols in a communication link, the graph capacity represents the highest rate achievable by zero-error codes. We consider a variation of the problem where the noise can only affect a fraction delta of the symbols and ask for the highest achievable rate of zero-error codes subject to this additional assumption. We present some Gilbert-Varshamov-like achievability results and a converse based on Delsarte linear programming bound. Improved bounds on the reliability function of some  discrete memoryless channels are derived as an application of these results.

3:00 pm3:15 pm
We show that the recently proposed Fast Fourier Aliasing-based Sparse Transform (FFAST) algorithm for computing the Discrete Fourier Transform (DFT) of signals with a sparse DFT is equivalent to iterative hard decision decoding of product codes. This connection is used to derive the thresholds for sparse recovery based on a recent analysis by Justensen  for computing thresholds for product codes. We first extend Justesen's analysis to d-dimensional product codes and compute thresholds for the FFAST algorithm based on this. Additionally, this connection also allows us to analyze the performance of the FFAST algorithm under a burst sparsity model in addition to  the uniformly random sparsity model which was assumed in prior work.

### Wednesday, June 8th, 2016

9:30 am10:00 am
We present a strengthened entropy power inequality (EPI) when one of the random summands is Gaussian. The sharpening is closely related to strong data processing for Gaussian channels and generalizes Costa’s EPI. This leads to new reverse convolution inequalities and, as a corollary, establishes stability of the Gaussian log Sobolev inequality in terms of entropy and/or Fisher information jumps under rescaled convolution. Applications to network information theory will be discussed, including a short proof of the converse for the two-encoder quadratic Gaussian source coding problem.

10:00 am10:30 am
It is shown that under suitable regularity conditions, differential entropy is $O(sqrt{n})$-Lipschitz on the space of distributions on $n$-dimensional Euclidean space with respect to the quadratic Wasserstein distance. This result allows one to couple the multi-user interference with its i.i.d. approximations where existence of good couplings follows from Talagrand's transportation-information inequality. As an application, a new outer bound for the two-user Gaussian interference channel is proved that improves the state of the art, which, in particular, settles the corner point conjecture of Costa (1985). Extensions of this program to discrete settings will be discussed where Wasserstein distance is replaced by Ornstein's $bar{d}$-distance and Marton's inequality is invoked in place of Talagrand's.

This is joint work with Yury Polyanskiy (MIT).
11:00 am11:30 am
Speaker:

The cut-set bound developed by Cover and El Gamal in 1979 has since remained the best known upper bound on the capacity of the Gaussian relay channel. We develop a new explicit upper bound on the capacity of the Gaussian primitive relay channel which is tighter than the cut-set bound. Combined with a simple tensorization argument, this result also implies that the current capacity approximations for Gaussian relay networks, which have linear gap to the cut-set bound in the number of nodes, are order-optimal and leads to a lower bound on the pre-constant.  Our approach significantly differs from the standard information-theoretic approach for proving upper bounds on the capacity of multi-user channels. We use measure concentration to analyze  the probabilistic geometric relations between the typical sets of the n-letter random variables associated with a reliable code, which translate to new entropy inequalities between the random variables involved in the problem.

11:30 am12:00 pm

This talk will present a characterization of the hypercontractivity region for the binary erasure channel. It builds on the information theoretic equivalent characterizations of hypercontractivity.

12:00 pm12:15 pm
We derive an expression for marginals of arbitrary order of a tree-structured distribution in terms pairwise correlations. The expression involves an expansion into terms of varying order, each one being described by a certain combinatorial optimization. The key property of this expression is that it does not depend explicitly on the structure of the tree. Superficially similar representations make use of tree-recursions to express probabilities in a way that depends on the tree structure. An implication of our result is that two tree-distributions with similar correlations are close in all low- order marginals, a statement of significance to learning tree models in order to subsequently make predictions.

2:00 pm2:30 pm
We explicitly construct an extractor for two independent sources on n bits, each with min-entropy at least log^C n for a large enough constant C. Our extractor outputs one bit and has error n^{-\Omega(1)}. The best previous extractor, by Bourgain, required each source to have min-entropy .499n.

A key ingredient in our construction is an explicit construction of a monotone, almost-balanced boolean function on n bits that is resilient to coalitions of size n^{1-delta}, for any delta>0. In fact, our construction is stronger in that it gives an explicit extractor for a generalization of non-oblivious bit-fixing sources on n bits, where some unknown n-q bits are chosen almost polylog(n)-wise independently, and the remaining q=n^{1-\delta} bits are chosen by an adversary as an arbitrary function of the n-q bits. The best previous construction, by Viola, achieved q=n^{1/2 - \delta}.

Our explicit two-source extractor directly implies an explicit construction of a 2^{(log log N)^{O(1)}}-Ramsey graph over N vertices, improving bounds obtained by Barak et al. and matching independent work by Cohen.

2:30 pm3:00 pm
The rate-distortion function describes the minimum information rate required to sustain a given distortion in the limit of infinite blocklength. The rate-distortion function is given by the solution to a mutual information minimization problem that in general does not have an explicit solution. Fortunately, for a class of distortion measures, the rate-distortion function is bounded from below by a simple, explicit formula known as Shannon’s lower bound.

In this work, we present a nonasymptotic version of Shannon’s lower bound, and we show that lattice quantizers approach that lower bound, provided that the source density is smooth enough and the distortion is low, which implies that fine multidimensional lattice coverings are nearly optimal in rate-distortion sense even at finite n.
3:05 pm3:20 pm
In a sequence of remarkable results, Ganor, Kol and Raz, over the last couple of years, show that there exist non-product distributions which exhibit exponential separation between internal information complexity and distributional communication complexity. However, it is still open if internal information complexity (or a polynomial of it) upper bounds the public-coin randomized communication complexity (up to logarithmic multiplicative factors in the input size). In this result, we show that when the inputs to the two parties are drawn from a product distribution, this is indeed the case. Informally, we show that for product distributions, the two-party public-coin randomized communication complexity CC(f) is at most quadratically larger than either the partition bound prt(f) or the information complexity IC(f) (i.e., CC(f) = O( (prt(f) \log prt(f))^2 ) and CC(f) = O( (IC(f) \log IC(f))^2 )). Similar results for CC(f) vs IC(f) were independently obtained by Kol recently.

3:20 pm3:35 pm
We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the m machines receives n data points from a d-dimensional Gaussian distribution with unknown mean θ which is promised to be k-sparse. The machines communicate by broadcast and aim to estimate the mean θ. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed sparse linear regression problem: to achieve the statistical minimax error, the total communication is at least Ω(min{n,d}m), where n is the number of observations that each machine receives and d is the ambient dimension. We also give the first optimal simultaneous protocol in the dense case for mean estimation.
As our main technique, we prove a distributed data processing inequality, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.

Based on joint work with Mark Braverman, Ankit Garg, Tengyu Ma, and David P. Woodruff.

### Thursday, June 9th, 2016

9:30 am10:00 am
Ranking is one of the fundamental problems that have proved crucial in a wide spectrum of contexts: social choice, sports competitions, web search, recommendation systems, to name a few. PageRank, the backbone of Google’s search engine, is perhaps the most popular ranking paradigm, but is facing a challenge in the big data era: the vast number of pages on the web makes the ranking problem computationally challenging; hence, PageRank often serves only as an offline algorithm.

To address the challenge, we take a disruptive approach that focuses on identifying only a few significant items, say top-K ranked items. Under a prominent measurement model in which measurements are of the pairwise information type, we characterize the fundamental limits (up to some constant factor) on the sample size required for reliably identifying the top-K ranked items. As a consequence, we demonstrate that our top-K ranking approach leads to signification reduction in sample complexity. In this talk, I will present an algorithm that runs nearly linearly in the sample size and which achieves the information theoretic limit.

10:00 am10:30 am
Speaker:
No abstract available.

11:00 am11:30 am

No abstract available.

11:30 am12:00 pm
Speaker:
This paper considers the stabilization of an unstable discrete-time scalar linear system that is observed over a channel corrupted by continuous multiplicative noise. The main result is a converse bound that shows that if the system gain is large enough the system cannot be stabilized in a mean-squared sense.

It was known that a system with multiplicative observation noise can be stabilized using a simple linear strategy if the system growth is bounded (this bound depends on the noise parameters). However, it was not clear whether non-linear controllers could overcome arbitrarily large growth factors. We use a non- standard approach to provide a proof-of-concept converse. One difficulty with multiplicative noise is that the mutual information per round between the system state and the observation is potentially unbounded. We handle this by providing the controller with side information about the magnitude of the state.

This is joint work with Jian Ding and Yuval Peres. The authors met through the Simons Institute.

12:00 pm12:15 pm
If the modern perspective on the engineered world is split between computation and information, where does control fit in? Control straddles the boundary because it shares many characteristics with distributed computation (witness the study of tree codes for interactive protocols) but it also has an informational dimension to it. In this way, control is about information flows except that the information flows are embodied in ways that we are constrained to effect. The additional structure of linear control systems allows us to say more about them from an informational point of view. This talk introduces a few ideas in this direction.

2:00 pm2:15 pm
Rich model classes are desirable to handle complex problems, but they may exclude familiar uniform consistency guarantees. To enable handling rich probabilistic model classes in line with our expectations of what data must achieve, we outlined the \emph{data-driven} consistency framework during the Information Theory and Big Data workshop. In this talk, we summarize further developments on this topic from the viewpoint of the ubiquitous machine learning tools of regularization and cross-validation.

2:15 pm2:45 pm
We prove that a known approach to improve Shamir's celebrated secret sharing scheme; i.e., adding an information-theoretic authentication tag to the secret, can make it robust for $n$ parties against any collusion of size $\delta n$, for any constant $\delta \in (0, 1/2)$. This result holds in the so-called "non-rushing" model in which the $n$ shares are submitted simultaneously for reconstruction. We thus obtain an efficient and robust secret sharing scheme in this model that is essentially optimal in all parameters including the share size which is $k(1+o(1)) + O(\kappa)$, where $k$ is the secret length and $\kappa$ is the security parameter. Like Shamir's scheme, in this modified scheme any set of more than $\delta n$ honest parties can efficiently recover the secret. Using algebraic geometry codes instead of Reed-Solomon codes, we decrease the share length to a constant (only depending on $\delta$) while the number of shares $n$ can grow independently. In this case, when $n$ is large enough, the scheme satisfies the "threshold" requirement in an approximate sense; i.e., any set of $\delta n(1+\rho)$ honest parties, for arbitrarily small $\rho > 0$, can efficiently reconstruct the secret.

2:45 pm3:15 pm
We consider an unsupervised clustering problem motivated by the processing of the de novo sequencing reads. Despite combinatorial nature of the problem, we propose an iterative convex reformulation of the problem using a greedy procedure with order optimal sample complexity. Based on the proposed greedy procedure, we develop an algorithm, dubbed CONVEX, which has linear computational complexity in the number of reads and can be implemented on parallel multi-core machines. We compare the performance of the algorithm with the state-of-the-art software, ICE, on various datasets such as ERCC2.0 and heart/liver tissue PacBio datasets. Our numerical experiments show that CONVEX results in up to 20% improvement in the number of denoised reads in addition to being 10 times faster than ICE on moderate and large size datasets