Big Data All Hands Meeting

Parent Program

Theoretical Foundations of Big Data Analysis

Location

Calvin Lab 2nd Floor Interaction Area

Speaker(s)

Suresh Venkatasubramanian (University of Utah)

Moritz Hardt (UC Berkeley)

Date

Tuesday, Nov. 26, 2013

Time

12:30 – 1:30 p.m. PT

Back to calendar

Description

Suresh Venkatasubramanian (University of Utah)
A Geometric Algorithm for Scalable Multiple Kernel Learning

We present a geometric formulation of the Multiple Kernel Learning (MKL) problem. To do so, we reinterpret the problem of learning kernel weights as searching for a kernel that maximizes the minimum (kernel) distance between two convex polytopes. This interpretation combined with novel structural insights from our geometric formulation allows us to reduce the MKL problem to a simple optimization routine that yields provable convergence as well as quality guarantees. As a result our method scales efficiently to much larger data sets than most prior methods can handle.

Joint work with John Moeller, Parasaran Raman and Avishek Saha.

Moritz Hardt (IBM Almaden)
On the Computational Threshold of Robust Subspace Recovery

We consider a fundamental problem in unsupervised learning called robust subspace recovery: Given a collection of m points in n dimensions such that some but not all of these points are contained in a d-dimensional subspace T, can we recover the subspace? The points contained in T are called inliers and the remaining points are outliers. This problem has received considerable attention in computer science and in statistics. Yet efficient algorithms from computer science are not robust to adversarial outliers, and the estimators from robust statistics are hard to compute in high dimensions. Are there algorithms for subspace recovery that are both robust to outliers and computationally efficient?

We give a simple and efficient algorithm that finds T when it contains more than a d/n fraction of the points. We prove that no computationally efficient estimator can recover T from any smaller fraction of inliers assuming the Small Set Expansion Hypothesis. This gives evidence that our estimator is an optimal compromise between computational efficiency and robustness.

Big Data All Hands Meeting

All scheduled dates:

Upcoming

Past