We will introduce a formal framework for thinking about representation learning, endeavoring to capture its power in settings like semi-supervised
and transfer learning. The framework involves modeling how the data was generated, and thus is related to previous Bayesian notions with some new twists.
In simple settings where it is possible to learn representations from unlabeled data, we show:
(i) it can greatly reduce the need for labeled data (semisupervised learning)
(ii) it allows solving classification tasks when previous notions such as nearest neighbors or manifold learning either don't work or require too much data.
We also clarify two important settings ---linear mixture models and loglinear models---where representation learning can be done under plausible assumptions (despite being NP-hard in the worst case).
Joint work with Sanjeev Arora.