Dorit Hochbaum (UC Berkeley)
Calvin Lab 116
An effective combinatorial algorithm for image segmentation and data mining
We present a model for clustering which combines two criteria: Given a collection of objects with pairwise similarity measure, the problem is to find a cluster that is as dissimilar as possible from the complement, while having as much similarity as possible within the cluster. The two objectives are combined either as a ratio or with linear weights. The ratio problem, and its linear weighted version, are solved by a combinatorial algorithm within the complexity of a single minimum s,t-cut algorithm. We call this problem "the normalized cut prime" (NC') as it is closed related to the NP-hard problem of normalized cut.
The relationship of NC' to normalized cut is generalized to a problem we call "q-normalized cut". It is shown that the spectral method that solves for the Fielder eigenvector of a related matrix is a continuous relaxation of the problem. In contrast, the generalization of the combinatorial algorithm solves a discrete problem resulting from a relaxation of a single sum constraint. We study the relationship between these two relaxations and demonstrate a number of advantages for the combinatorial algorithm. These advantages include a better approximation, in practice, of the normalized cut objective for image segmentation benchmark problems.
Time permitting, we will discuss the application of NC', as a supervised machine learning technique, to data mining, and its comparison to leading machine learning techniques on datasets selected from the UCI data mining benchmark.