Big Data All Hands Meeting

Parent Program

Theoretical Foundations of Big Data Analysis

Location

Calvin Lab 116

Speaker(s)

Dorit Hochbaum (UC Berkeley)

Date

Tuesday, Oct. 15, 2013

Time

12:30 – 1:30 p.m. PT

Back to calendar

Description

An effective combinatorial algorithm for image segmentation and data mining

We present a model for clustering which combines two criteria: Given a collection of objects with pairwise similarity measure, the problem is to find a cluster that is as dissimilar as possible from the complement, while having as much similarity as possible within the cluster. The two objectives are combined either as a ratio or with linear weights. The ratio problem, and its linear weighted version, are solved by a combinatorial algorithm within the complexity of a single minimum s,t-cut algorithm. We call this problem "the normalized cut prime" (NC') as it is closed related to the NP-hard problem of normalized cut.

The relationship of NC' to normalized cut is generalized to a problem we call "q-normalized cut". It is shown that the spectral method that solves for the Fielder eigenvector of a related matrix is a continuous relaxation of the problem. In contrast, the generalization of the combinatorial algorithm solves a discrete problem resulting from a relaxation of a single sum constraint. We study the relationship between these two relaxations and demonstrate a number of advantages for the combinatorial algorithm. These advantages include a better approximation, in practice, of the normalized cut objective for image segmentation benchmark problems.

Time permitting, we will discuss the application of NC', as a supervised machine learning technique, to data mining, and its comparison to leading machine learning techniques on datasets selected from the UCI data mining benchmark.

All scheduled dates:

Upcoming

No Upcoming activities yet

Big Data All Hands Meeting

All scheduled dates:

Upcoming

Past