Abstract
Human experts are crucial to data analysis. Their roles include sifting through large datasets for entries that are relevant to a particular task, identifying anomalous data, and annotating raw data to facilitate subsequent search, retrieval, and learning. Humans often perform much better than machines at such tasks. However, while the capacity to collect, transmit, and store big datasets has been increasing rapidly, the capacity of human experts to provide feedback and to leverage all available information has hardly changed. Humans are the information bottleneck in data analysis and will remain so in the future.
I will discuss new theory and algorithms that enable machines to learn efficiently from human experts, using a minimal amount of human interaction. The models so learned then inform the understanding of human cognition and the design of better algorithms for data processing. I will focus on active learning from human experts based on crowdsourcing training data actively to a pool of people. Rather than randomly selecting training examples for labeling, active crowdsourcing sequentially and adaptively selects the most informative examples for human evaluation, based on information gleaned from the analysis to date. By making optimal use of human expert judgment, these active learning method can speed up the training of a variety of machine learning algorithms.