Description

Two Paradigms of Semi-Supervised Active Clustering with Sample and Computational Complexity Bounds

I will describe two paradigms for clustering that incorporates domain expertise through an interaction between the clustering algorithms and the user of the clustering.

The first paradigm asks the user to cluster a small random sample of the data and uses that clustering to construct a metric that reflects the user's preferences (and is subsequently used to cluster the data).  For this paradigm we show sample complexity bounds that are independent of the input data size. 

The second paradigm uses interactive must-link/can't-link queries to speed up centre-based clustering tasks.  We show that for data sets satisfying some clusterabiity condition, a logarithmic number of such queries can turn an otherwise NP-hard clustering task into one solvable in pseudo-linear time. 

I would also like to devote some time to an open discussion of the role of domain knowledge and interaction in clustering.

The first work is with Hassan Zokaei Ashtiani and was presented in UAI 2015.  The second result is with Hassan and Shrinu Kushagra and was presented in NIPS 2016.
 

All scheduled dates:

Upcoming

No Upcoming activities yet

Past