Results 1371 - 1380 of 24094
The theory of learning under the uniform distribution is rich and deep. It is connected to cryptography, computational complexity, and analysis of boolean functions to name a few areas. This theory however is very limited in applications due to the fact that the uniform distribution and the corresponding Fourier basis is rarely encountered as a statistical model.
A family of distributions that vastly generalizes the uniform distribution on the Boolean cube is that of distributions represented by Markov Random Fields (MRF). Markov Random Fields are one of the main tools for modeling high dimensional data in all areas of statistics and machine learning. In this paper we initiate the investigation of extending central ideas, methods and algorithms from the theory of learning under the uniform distribution to the setup of learning concepts given examples from MRF distributions. In particular, our results establish a novel connection between properties of MCMC sampling of MRFs and learning under the MRF distribution.
Based on joint work with Elchanan Mossel.
We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in R^d, our algorithm achieves O(dn^{ ho}) query time and O(n^{1 + ho} + nd) space, where ho <= 7/(8c^2) + O(1 / c^3) + o(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality-sensitive hashing lower bound proved by O'Donnell, Wu and Zhou (ITCS 2011). By a standard reduction we obtain a data structure for the Hamming space and ell_1 norm with ho <= 7/(8c) + O(1/c^{3/2}) + o(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).
Joint work with Piotr Indyk, Huy L. Nguyen, Ilya Razenshteyn. arxiv.org/abs/1306.1547
Finding cliques in random graphs and the closely related "planted" clique variant, where a clique of size k is planted in a random G(n,1/2) graph, have been the focus of substantial study in algorithm design. Despite much effort, the best known polynomial-time algorithms only solve the problem for k ~ sqrt(n). Here we show that beating sqrt(n) would require substantially new algorithmic ideas, by proving an integrality gap for the problem in the Lasserre/Sum of Squares hierarchy, the most powerful class of semi-definite programming algorithms we know of. Our (average case) lower bound uses tools from the classical theory of association schemes and some new large deviation bounds for matrix-valued polynomials which could be of independent interest.
This is joint work with Avi Wigderson.
We generalize algorithms from computational learning theory that are successful under the uniform distribution on the Boolean hypercube ${0,1}^n$ to algorithms successful on permutation invariant distributions, distributions where the probability mass remains constant upon permutations in the instances. While the tools in our generalization mimic those used for the Boolean hypercube, the fact that permutation invariant distributions are not product distributions presents a significant obstacle.
Under the uniform distribution, halfspaces can be agnostically learned in polynomial time for constant $eps$. The main tools used are a theorem of Peres~cite{Peres:04} bounding the emph{noise sensitivity} of a halfspace, a result of~cite{KOS:04} that this theorem implies Fourier concentration, and a modification of the Low-Degree algorithm of Linial, Mansour, and Nisan~cite{ LMN:93} made by Kalai et. al.~cite{Kalai2008a}. These results are extended to arbitrary product distributions in~cite{BOW2010}.
We prove analogous results for permutation invariant distributions; more generally, we work in the domain of the symmetric group. We define noise sensitivity in this setting, and show that noise sensitivity has a nice combinatorial interpretation in terms of Young tableaux. The main technical innovations involve techniques from the representation theory of the symmetric group, especially the combinatorics of Young tableaux. We show that low noise sensitivity implies concentration on "simple" components of the Fourier spectrum, and that this fact will allow us to agnostically learn halfspaces under permutation invariant distributions to constant accuracy in roughly the same time as in the uniform distribution over the Boolean hypercube case.
Let H, G be graphs on k (small) and n (large) vertices respectively. We denote by p(H,G) the probability that a randomly chosen set of k vertices in G spans a subgraph isomorphic to H. For fixed k, the vector (p(H,G) | H is a k-vertex graph) is called the k-profile of G.
Q1: What do the k-profiles of large graphs look like? ("Local graph theory")
Q2: Given information about G's k-profile, what conclusions can we draw? (Local-global graph theory).
Similar questions can be asked concerning numerous other combinatorial objects such as tournaments, permutations etc. In this talk I will present some of what is already known in this area as well as the many intriguing questions that are still open.
The talk covers joint work with: Chaim Even Zohar, Hao Huang, Avraham Morgenstern, Humberto Naves, Yuval Peled, Benny Sudakov.
A locally testable code (LTC) is an error correcting code which admits a very efficient procedure for testing membership in the code: a local tester queries few locations in the received word but still distinguishes codewords from words that are far from the code. The main open questions about LTCs are about the tradeoffs possible between rate, distance and query complexity.
We show that the local testability of a binary linear code is related (and in fact equivalent) to the L_1 embeddability of a related Cayley graph. Thus one can reformulate existential questions about LTCs as questions about the existance of certain Cayley graphs that admit constant distortion embeddings into L_1. We discuss what this tells us about existing results on LTCs and what it might tell us about new results.
Joint work with Salil Vadhan (Harvard) and Yuan Zhou (CMU).
In this talk we show several results regarding Boolean functions with small spectral norm (the spectral norm of f is $\|\hat{f}\|_1=\sum_{\alpha}|\hat{f}(\alpha)|$). Specifically, we prove the following results for functions $f:\{0,1\}^n \to \{0,1\}$ with $\|\hat{f}\|_1=A$.
1. There is a subspace $V$ of co-dimension at most $A^2$ such that $f|_V$ is constant.
2. f can be computed by a parity decision tree of size $2^{A^2}n^{2A}$. (a parity decision tree is a decision tree whose nodes are labeled with arbitrary linear functions.)
3. If in addition f has at most s nonzero Fourier coefficients, then f can be computed by a parity decision tree of depth $A^2 \log s$.
4. For every $0<\epsilon$ there is a parity decision tree of depth $O(A^2 + \log(1/\epsilon))$ and size $2^{O(A^2)} \cdot \min\{1/\epsilon^2,O(\log(1/\epsilon))^{2A}\}$ that \epsilon-approximates f. Furthermore, this tree can be learned, with probability $1-\delta$, using $\poly(n,\exp(A^2),1/\epsilon,\log(1/\delta))$ membership queries.
All the results above also hold (with a slight change in parameters) to functions $f:Z_p^n\to \{0,1\}$.
Let F be a prime field. An affine-invariant property is a property of functions on F^n that is closed under taking affine transformations of the domain. We prove that every affine-invariant property with a local characterization is testable. In fact, we show that for any such property, there is a test that, given an input function, makes a constant number of queries, always accepts if it satisfies the property, and otherwise rejects with a positive probability depending only on the distance of the function from the property.
A common theme in many areas of theoretical computer science is that it is sometimes much easier to compute approximations to a function than to compute the function exactly, while sometimes approximation does not make things any easier. In this talk, we explore the hardness of approximating functions in the circuit complexity model. Specifically, we restrict our attention to depth-2 circuits (i.e., DNFs and CNFs) and we ask how large such a circuit must be to agree with a given target function on 99% of all possible inputs. As we will see during the talk, the study of this question leads to surprising results and to interesting connections between complexity theory and information theory, the analysis of Boolean functions, and extremal combinatorics.
This talk is based on joint works with Eric Blais.
We present a deterministic algorithm to epsilon-approximately count the number of satisfying assignments of a k-junta of degree-2 PTFs with a running time of poly(n). exp(k,epsilon). In particular, the algorithm has a poly(n) running time for slightly superconstant k and subconstant eps. It is a significant extension of a recent result on deterministic counting for degree-2 PTFs and is based on using multidimensional CLTs for gaussian polynomials derived from a combination of Stein's method and Malliavin calculus. Also, it provides the first instance of a deterministic poly-time counting algorithm for a non-trivial class of depth-3 circuits.