# Abstracts

### Monday, February 9th, 2015

9:30 am10:30 am

Coding Theory for Scalable Media Delivery
Michael Luby, Qualcomm Inc.

Coding For Distributed Storage
Alex Dimakis, University of Texas, Austin

Finite Length — The Final Frontier
Rüdiger Urbanke, Ecole Polytechnique Fédérale de Lausanne

Polar Codes over q-ary Alphabets and Polynomially Fast Convergence to Shannon Capacity
Ameya Velingker, Carnegie Mellon University

Pliable Index Coding
Christina Fragouli, UCLA and École Polytechnique Fédérale de Lausanne

A Reductionist View of Network Information Theory
Mike Langberg, University at Buffalo, SUNY and Michelle Effros, California Institute of Technology

Gap to Capacity of Finite Length Polar Codes
David Burshtein, Tel Aviv University

Coding for NVMs: Advances In Theory Meet New Applications
Lara Dolecek, UCLA

11:00 am12:00 pm

Elastic Source-channel Erasure Codes for Multicast
Emina Soljanin, Bell Labs

Recovery Threshold in the Stochastic Block Model
Emmanuel Abbe, Princeton University

STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures in Practical Storage Systems
Patrick Lee, Chinese University of Hong Kong

Codes for Distributed Storage: Recent Results
P. Vijay Kumar, Indian Institute of Science and University of Southern California

Coding for Streaming Communication
Ashish Khisti, University of Toronto

Multi-version Coding and Shared Memory Emulation

Deterministic Rateless Codes for BSC
Guy Even, Tel Aviv University

Coding for Distributed Storage
Itzhak Tamo, Tel Aviv University

1:30 pm2:30 pm

Secret-sharing over a Network
Arya Mazumdar, University of Minnesota

Sparse Superposition Codes
Florent Krzakala, École Normale Supérieure Paris

Deletion Codes in the High-noise and High-rate Regimes
Carol Wang, Carnegie Mellon University

Non-malleable Codes Against Constant Split-state Tampering
David Zuckerman, University of Texas, Austin

Codes to Mitigate Intercell Interference in Flash Memories
Paul Siegel, UC San Diego

Symmetric Product Codes
Henry Pfister, Duke University

Permutation Codes in the Ulam Metric
Olgica Milenkovic, University of Illinois, Urbana-Champaign

3:00 pm4:00 pm

Spatial Coupling vs. Block Coding: A Comparison
Daniel Costello, University of Notre Dame

Polarizing Transforms
Emre Telatar, École Polytechnique Fédérale de Lausanne

List Recovering Expander Codes
Mary Wootters, Carnegie Mellon University

LRC for Hot Data
Cheng Huang, Microsoft Research Redmond

Why 'Data Scientists' Should Learn About Coding Theory
Andrea Montanari, Stanford University

Are Polar Codes Practical?
Warren Gross, McGill University

Maximally Recoverable Codes for Data Storage
Sergey Yekhanin, Microsoft Research

### Tuesday, February 10th, 2015

9:30 am10:00 am

I will provide a general introduction to some problem of statistical estimation with an underlying random graph structure. Examples range from coding theory to subgraph detection, to the hidden clique problem. I will emphasize unifying concepts and open challenges.

10:00 am10:30 am

Through the tool of reduction, we build theory from codes, demonstrating new relationships between seemingly disparate problems and highlighting the potential for a new kind of unifying theory for the field.

11:00 am11:30 am

Polar codes provably achieve the symmetric capacity of a memoryless channel while having an explicit construction. The adoption of polar codes however, has been hampered by their mediocre error-correcting performance at short block lengths and the low throughput of their successive-cancellation decoding algorithm. In this talk, we will describe the recent progress along both of these fronts, towards practical polar decoders.

11:30 am12:00 pm

The edge removal problem addresses the loss in capacity when a single edge is removed from a given network. In this talk I will address intriguing connections between the edge removal problem and seemingly unrelated problems in network information theory.

1:30 pm2:00 pm

The shared memory emulation problem is a problem studied in distributed computing with applications to distributed storage systems. The multi-version coding problem is a coding theoretic problem that is recently formulated with motivations to shared memory emulation. We will introduce these problems and discuss possible connections in this talk.

2:00 pm2:30 pm

An increasing number of applications are streaming in nature. Information packets must be encoded and transmitted sequentially in real-time, and the receiver should reproduce the source stream under strict delay constraints. The study of fundamental limits and coding schemes in such communication systems is a fertile area of research. In this talk we will show how certain judiciously constructed structured codes can yield substantial gains over baseline schemes in streaming applications.

As our main result, we will present a new family of streaming-codes that achieve significant performance gains over the practically relevant Gilbert-Elliott (GE) channel model. We will discuss how a certain deterministic approximation to the GE channel provides insights into the optimality of these codes. We will also discuss the operational significance of column-distance and column-span metrics in this setup, and show that our proposed codes achieve a near optimal tradeoff between these.

Joint work with Ahmed Badr (University of Toronto), Wai-Tian Tan (Cisco Systems) and John Apostolopoulos (Cisco Systems).

3:00 pm3:30 pm

Non-volatile memories (NMVs) are becoming increasingly more pervasive, as they are bound to replace hard disk drives in many applications, ranging from consumer electronics to cloud storage. Operational idiosyncrasies of NVMs prevent us from naively using conventional coding schemes. In this talk, we will overview some of recent results on coding-inspired methods for NVMs, including graded bit error correcting codes, WOM codes, constrained rank modulation, dynamic threshold schemes, and optimized graph-based codes. We will also discuss other promising future directions for development of coding (and non-coding)-based methods with applications to NVMs.

3:30 pm4:00 pm

No abstract avaliable.

### Wednesday, February 11th, 2015

9:30 am10:00 am

In this two-part talk, we will show first that the notion of locality in codes has a very natural extension to codes with hierarchical locality. We provide a bound on the minimum distance of such codes as well as two general constructions that achieve this bound. The second part of the talk will describe a recent construction of an MSR regenerating code with polynomial sub-packetization level. This is joint work with Birenjith Sasidharan and Gaurav Agarwal.

10:00 am10:30 am

Codes for distributed storage are designed to both reliably maintain the stored data and efficiently maintain the guaranteed reliability of the storage network under node failures. These properties have already made such codes important for data storage practice. But code design also determines the ways in which data can be retrieved from coded storage, and thus how available it is under given data request patterns and the request scheduling policies. This talk will look into data retrieval quality of service in coded distributed storage and its implications on code design.

11:00 am11:30 am

Practical storage systems often adopt erasure codes to tolerate device failures and sector failures, both of which are prevalent in the field. However, traditional erasure codes employ device-level redundancy to protect against sector failures, and hence incur significant space overhead. In this talk, I will present a general family of erasure codes called STAIR codes, which efficiently and provably tolerate both device and sector failures without any restriction on the size of a storage array and the numbers of tolerable device failures and sector failures. We propose the upstairs encoding and downstairs encoding methods, which provide complementary performance advantages for different configurations. We conduct extensive experiments to justify the practicality of STAIR codes in terms of space saving, encoding/decoding speed, and update cost.

11:30 am12:00 pm

In this talk, we compare spatially coupled LDPC codes to LDPC block codes according to several criteria. For convenience, we focus on photograph-based code ensembles, both binary and non-binary. Comparisons of ensemble average asymptotic properties and the finite length performance of specific codes are considered. Specific criteria for comparison include iterative decoding thresholds, minimum distance, trapping set, and absorbing set properties, analytical error-rate bounds, simulated error-rate performance, decoder latency and memory requirements, computational complexity, and decoder design complexity. For spatially coupled codes, both flooding schedule and sliding window decoding methods are considered.

12:00 pm12:30 pm

We consider the problem of storing and retrieving information from synthetic DNA media. The mathematical basis of the problem is the construction and design of sequences that may be discriminated based on their collection of substrings observed through a noisy channel. This problem of reconstructing sequences from traces was first investigated in the noiseless setting under the name of “Markov type” analysis. Here, we explain the connection between the reconstruction problem and the problem of DNA synthesis and sequencing, and introduce the notion of a DNA storage channel. We analyze the number of sequence equivalence classes under the channel mapping and propose new asymmetric coding techniques to combat the effects of synthesis and sequencing noise. In our analysis, we make use of restricted de Bruijn graphs and Ehrhart theory for rational polytopes.

### Thursday, February 12th, 2015

9:30 am10:00 am

No abstract available.

10:00 am10:30 am

The noise model of deletions poses significant challenges in coding theory, with basic questions like the capacity of the binary deletion channel still being open. In this talk, we address the harder model of worst-case deletions, with a focus on constructing efficiently decodable codes for the two extreme regimes of high-noise and high-rate.

Our work is the first to achieve the qualitative goals of correcting a deletion fraction approaching 1 over bounded alphabets, and correcting a constant fraction of bit deletions with rate approaching 1. These results bring our understanding of deletion code constructions in these regimes to a similar level as worst-case errors.

11:00 am11:30 am

Recently the notion of locally recoverable code (LRC) was introduced in the field of distributed storage with the goal of combating erasures. Such a code enables a recovery of any symbol of the encoding by accessing a relatively small number of other encoded symbols. In the talk, LRC codes that attain the maximum possible value of the distance for a given locality parameter and code's cardinality will be presented. These codes are based on polynomial evaluations and can be viewed as a natural generalization of Reed-Solomon codes. We will conclude with some extensions of the code construction.

11:30 am12:00 pm

Expander codes—codes based on expander graphs, introduced by Sipser and Spielman in the 1990's—are notable for their extremely fast (linear) decoding time. In this talk, I'll mention two recent results which show that these old(ish) codes can learn new tricks. First, when appropriately instantiated, they admit sublinear time local decoding algorithms with rate approaching 1. Second, with a different instantiation, they can be list-recovered (a variant on list-decoding) in linear time, with rate again approaching 1.

These are joint works with Rafail Ostrovsky and Brett Hemenway.

1:30 pm2:00 pm

Since the landmark work of Shannon in 1948, error-correcting codes have gained widespread use in communication systems. However, the question of obtaining efficient codes that get arbitrarily close to Shannon capacity with polynomial complexity had remained open for decades. In 2008, Arıkan introduced the technique of channel polarization as a means of constructing explicit capacity-achieving error-correcting codes known as polar codes. A subsequent work of Guruswami and Xia established that for binary-input symmetric memoryless channels, polar codes allow communication rates within epsilon of the Shannon capacity with block length and encoding and decoding complexities all bounded polynomially in 1/epsilon. Polar codes are, therefore, the first explicit construction with such provable guarantees.

However, the question of obtaining similar provable guarantees for q-ary polar codes has remained open. In this work, we resolve this question by extending the result of Guruswami and Xia to q-ary codes. Our high level approach is to do a direct analysis of the convergence rate to capacity by proving an entropic inequality over prime alphabets. Such an inequality was implicit and straightfoward in the q=2 case, as the extremal case of the inequality is known, but for other prime q, the inequality is more subtle due to the lack of a simple characterization of the extremal cases. The polarization result for prime q is then used to obtain the general result for all q by splitting a communication channel into multiple channels over prime alphabets and polarizing each channel separately.

This is joint work with Venkatesan Guruswami.

2:00 pm2:30 pm

We introduce a model of a distributed storage system that is locally recoverable from any single server failure. Unlike the usual local recovery model of codes for distributed storage, this model accounts for the fact that each server or storage node in a network is connectible to only some, and not all other, nodes. This may happen for reasons such as physical separation, inhomogeneity in storage platforms etc. We estimate the storage capacity of a network under this model and propose some constructive schemes. From a coding theory point of view, this model is approximately dual of the well-studied index coding problem.

3:00 pm3:30 pm

Reed-Muller codes have recently regained significant attention, from both the computer science and electrical engineering coding communities. As for polar codes, Reed-Muller codes are generated by selecting rows in the tensor product matrix [1,1;0,1]^{\otimes k}, the matrix with Sierpinski triangles on the upper-diagonal. While polar codes are proved to be capacity-achieving, RM codes have long been conjectured to be capacity achieving on symmetric channels, without theoretical validations. We will overview recent results on this conjecture (joint work with A. Shpilka and A. Wigderson), and discuss other code variants (joint work with Y. Wigderson).

3:30 pm4:00 pm

Non-malleable codes were introduced by Dziembowski, Pietrzak and Wichs as an elegant generalization of the classical notion of error detection, where the corruption of a codeword is viewed as a tampering function acting on it. Informally, a non-malleable code with respect to a family of tampering functions F consists of a randomized encoding function Enc and a deterministic decoding function Dec with Dec(Enc(m))=m, such that for any tampering function f in F and any message m, either Dec(f(Enc(m))) equals m or is almost uncorrelated with m.

Of particular importance are non-malleable codes in the C-split-state model. In this model, the codeword is partitioned into C equal sized blocks and the tampering function family consists of functions (f_1,...,f_C) such that f_i acts on the ith block. We construct efficient non-malleable codes in the C-split-state model for C=10 that achieve constant rate and error 2^{-Omega(n)}. These are the first explicit codes of constant rate in the C-split-state model for any C=o(n) that do not rely on any unproven assumptions.

### Friday, February 13th, 2015

9:30 am10:00 am

A rateless code encodes a finite length information word into an infinitely long codeword such that longer prefixes of the codeword can tolerate a larger fraction of errors. A rateless code achieves capacity for a family of channels if, for every channel in the family, reliable communication is obtained by a prefix of the code whose rate is arbitrarily close to the channel's capacity. As a result, a universal encoder can communicate over all channels in the family while simultaneously achieving optimal communication overhead.

In this paper, we construct the first \emph{deterministic} rateless code for the binary symmetric channel. Our code can be encoded and decoded in $O(\beta)$ time per bit and in almost logarithmic parallel time of $O(\beta \log n)$, where $\beta$ is any (arbitrarily slow) super-constant function. Furthermore, the error probability of our code is almost exponentially small $\exp(-\Omega(n/\beta))$. Previous rateless codes are probabilistic (i.e., based on code ensembles), require polynomial time per bit for decoding, and have inferior asymptotic error probabilities.

Our main technical contribution is a constructive proof for the existence of an infinite generating matrix that each of its prefixes induce a weight distribution that approximates the expected weight distribution of a random linear code.

10:00 am10:30 am

We revisit a capacity achieving scheme for error correctiion over the Gaussian channel, previously introduced by Joseph and Barron. We show that these codes can be iteratively decoded in an efficient way thanks to the approximate message passing algorithm. As in LDPC codes, the decoding by message passing is however not performing until the capacity. We use the spatial coupling technique to saturate the capacity, in a very similar way as in compressed sensing. Finally, we show empirically with simulations that using structured coding matrices leads to good performances even in relatively small sizes.

11:00 am11:30 am

Product codes were introduced by Elias in 1954 and generalized by Tanner in 1981. Recently, a number of generalized product codes have been proposed for forward error-correction in high-speed optical communication. In practice, these codes are decoded by iteratively decoding each of the component codes. Symmetric product codes are a subclass of generalized product codes that use symmetry to reduce the block length to roughly half that of a product code with the same component code. When compared to a product code of similar length and rate, this modification can result in improved performance in both the waterfall region and the error floor. A special case of these codes was introduced by Tanner in 1981 and recently called a half-product code by Justesen in 2011. In this talk, we discuss some initial results on symmetric product codes. Our results show that (i) these codes have a larger normalized minimum distance than the product code from which they are derived, (ii) some small constructions achieve the best parameters possible with a linear code, and (iii) extending the construction to tightly-braided block codes achieves similar gains.

11:30 am12:00 pm

It was recently shown that the block length required to communicate reliably using polar codes scales as a low degree polynomial with respect to the inverse gap between channel capacity and code rate (which can be arbitrarily close to the capacity). This result extends to lossy source coding and to problems in multiuser information theory. We review recent results and also discuss the case of non-binary polar codes.