Fall 2013

Big Data All Hands Meeting

Oct. 15, 2013 12:30 pm1:30 pm

Add to Calendar


Dorit Hochbaum (UC Berkeley)


Calvin Lab 116

An effective combinatorial algorithm for image segmentation and data mining
We present a model for clustering which combines two criteria: Given a collection of objects with pairwise similarity measure, the problem is to find a cluster that is as dissimilar as possible from the complement, while having as much similarity as possible within the cluster.  The two objectives are combined either as a ratio or with linear weights.  The ratio problem, and its linear weighted version, are solved by a combinatorial algorithm within the complexity of a single minimum s,t-cut algorithm.  We call this problem "the normalized cut prime" (NC') as it is closed related to the NP-hard problem of normalized cut. 
The relationship of NC' to normalized cut is generalized to a problem we call "q-normalized cut".   It is shown that the spectral method that solves for the Fielder eigenvector of a related matrix is a continuous relaxation of the problem.  In contrast, the generalization of the combinatorial algorithm solves a discrete problem resulting from a relaxation of a single sum constraint.   We study the relationship between these two relaxations and demonstrate a number of advantages for the combinatorial algorithm.  These advantages include a better approximation, in practice, of the normalized cut objective for image segmentation benchmark problems.
Time permitting, we will discuss the application of NC', as a supervised machine learning technique, to data mining, and its comparison to leading machine learning techniques on datasets selected from the UCI data mining benchmark.