Ravi Kannan (Microsoft Research India)
Clustering is the problem of dividing a set of data points in high dimensional space into groups of similar points. It is an important ingredient of algorithms in many areas. Theoretical Computer Science has brought to bear powerful ideas to find nearly optimal clusterings, while in Statistics mixture models of data have been useful in understanding the structure of data and in developing clustering algorithms. However, in practice many heuristics (e.g., dimension reduction and the k-means algorithm) are widely used. The talk will describe some aspects of the Theoretical CS and Statistics approaches, and attempt to answer the question: Is there a happy marriage of these approaches with practice?
Light refreshments will be served before the lecture at 3:30 p.m.