Abstract

Standard multiclass classification techniques require O(k) computation during train and test where k is the number of classes, yet the information theoretic lower bound is O(log k). This gap matters little when k is on the order of 10 or 100, but at 10^4 or 10^6 it matters a great deal. I will discuss the theory of extreme multiclass classification including consistency, robustness, efficiency, and structure learning.

Video Recording