# Abstracts

### Monday, February 13th, 2017

9:30 am10:15 am

Unlike traditional machine learning methods, humans often learn from natural language instruction. As users become increasingly accustomed to interacting with computer devices using speech, their interest in instructing these devices in natural language is likely to grow.

We present our Learning by Instruction Agent (LIA), an intelligent personal agent that users can teach to perform new action sequences to achieve new commands, using solely natural language interaction. LIA uses a CCG semantic parser to ground the semantics of each command in terms of primitive executable procedures defined in terms of the sensors and effectors of the agent. When LIA is given a natural language command that does not understand, it prompts the user to explain how to achieve the command through a sequence of steps, also specified in natural language. As a result of the instruction episode, LIA learns two types of knowledge: (1) a new procedure defined in terms of previously understood commands, and (2) an extension to its natural language lexicon that enables it to understand and execute the new command across multiple contexts (e.g., having been taught how to “forward an email to Alice,” LIA can correctly interpret the command “forward this email to Bob.” This talk will present LIA, plus ongoing research to extend it to include demonstration-based learning, and theoretical questions raised by such learning from instruction.

10:45 am11:30 am

If machine learning is to discover knowledge from data, then machine teaching is an inverse problem to pass the knowledge on. More precisely, given a learning algorithm and a target model, the goal of machine teaching is to construct an optimal (e.g. the smallest) training set from which the algorithm will learn the target model. I will discuss several aspects of machine teaching that connect to interactive machine learning.

1:30 pm2:15 pm
Speaker:

Inverse Reinforcement Learning enables robots to learn objective functions by observing expert behavior, but implicitly assumes that the robot is a passive observer and that the person is an uninterested expert. In this talk, I will share our recent works in making this learning process interactive, where humans teach rather than provide expert demonstrations, and robots act to actively gather information to showcase what they've learned.

2:15 pm3:00 pm

Wouldn't it be nice if machines could understand content in images and communicate this understanding as effectively as humans? Such technology would be immensely powerful, be it for aiding a visually-impaired user navigate a world built by the sighted, assisting an analyst in extracting relevant information from a surveillance feed, educating a child playing a game on a touch screen, providing information to a spectator at an art gallery, or interacting with a robot. As computer vision and natural language processing techniques are maturing, we are closer to achieving this dream than we have ever been.

In this talk, I will present two ongoing thrusts in my lab that push the boundaries of AI capabilities at the intersection of vision, language, and commonsense reasoning.

Visual Question Answering (VQA): I will describe the task, our dataset (the largest and most complex of its kind), our model, and ongoing work for free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image (e.g., “What kind of store is this?”, “How many people are waiting in the queue?”, “Is it safe to cross the street?”), the machine’s task is to automatically produce an accurate natural language answer (“bakery”, “5”, “Yes”). We have collected and recently released a dataset containing >250,000 images, >760,000 questions, and ~10 million answers. Our dataset is enabling the next generation of AI systems, often based on deep learning techniques, for understanding images and language, and performing complex reasoning; in our lab and the community at large.

Learning Common Sense Through Visual Abstraction: Common sense is a key ingredient in building intelligent machines that make "human-like" decisions when performing tasks -- be it automatically answering natural language questions, or understanding images and videos. How can machines learn this common sense? While some of this knowledge is explicitly stated in human-generated text (books, articles, blogs, etc.), much of this knowledge is unwritten. While unwritten, it is not unseen! The visual world around us is full of structure bound by commonsense laws. But machines today cannot learn common sense directly by observing our visual world because they cannot accurately perform detailed visual recognition in images and videos. This leads to a chicken-and-egg problem: we would like to learn common sense to allow machines to understand images accurately, but in order to learn common sense, we need accurate image parsing. We argue that the solution is to give up on photorealism. We propose to leverage abstract scenes -- cartoon scenes made from clip art by crowd sourced humans -- to teach our machines common sense.

3:30 pm4:15 pm

Policy evaluation is a crucial step in many reinforcement-learning problems, which estimates a value function that predicts long-term value of states for a given policy. In this talk, we present stochastic variance reduction algorithms that learn value functions from a fixed dataset, which is shown to have (i) guaranteed linear convergence rate, and (ii) linear complexity (in both sample size and feature dimension), under the condition of linear function approximation and possibly off-policy learning as well as eligibility traces. In particular, we transform the policy evaluation problem into an empirical (quadratic) saddle-point problem and apply stochastic variance reduction methods in the primal-dual space. Interestingly, the algorithms converge linearly even when the quadratic saddle-point problem has only strong concavity but no strong convexity. Numerical experiments on random MDPs and on Mountain Car demonstrate improved performance of our algorithms.

### Tuesday, February 14th, 2017

9:30 am10:15 am

Clustering is typically studied in the unsupervised learning setting. But in many applications, such as personalized recommendations, one cannot reach the optimal clustering without interacting with the end user. In this talk, I will describe a recent framework for interactive clustering with human in the loop. The algorithm can interact with the human in stages and receive limited, potentially noisy feedback to improve the clustering. I will present our preliminary results in this model and mention open questions.

10:45 am11:30 am

People understand many domains more deeply than today's machine learning systems. Having a good representation for a problem is crucial to the success of intelligent systems. In this talk, we discuss recent work and future opportunities for how humans can aid machine learning algorithms. Beyond simply labeling data, the crowd can help uncover the latent representation behind a problem. We discuss recent work on eliciting features using active learning as well as other aspects of crowdsourcing and machine learning, such as how crowdsourcing can help generate data, raise questions, and assist in more complex AI tasks.

1:30 pm2:15 pm

An active learner is given a hypothesis class, a large set of unlabeled examples and the ability to interactively query labels of a subset of them; the learner's goal is to learn a hypothesis in the class that fits the data well by making as few label queries as possible. While active learning can yield considerable label savings in the realizable case -- when there is a perfect hypothesis in the class that fits the data -- the savings are not always as substantial when labels provided by the annotator may be noisy or biased. Thus an open question is whether more complex feedback can help active learning in the presence of noise. In this talk, I will present two separate feedback mechanisms, and talk about when they can help reduce the label complexity of active learning. The first is when labels are obtained from multiple annotators with slightly different labeling patterns; the second is when the annotator can say "I don't know" instead of providing an incorrect label.

2:15 pm3:00 pm

The scale and complexity of biological systems makes biological research a fertile domain for active learning. This is because for complex biological systems, time and cost constraints make it infeasible to do all possible experiments. Previous applications of active learning in biology have been limited, and can be divided between retrospective studies, the goal of which is just to demonstrate the usefulness of active learning algorithms, and prospective studies, in which active learning is actually used to drive experimentation. The latter ones are rare. Furthermore, past studies mostly considered unidimensional active learning, where only a single variable is explored (i.e., which member of a set of drugs is most active on a single target). The goal was to reduce cost for a study that would have been feasible but expensive if carried out exhaustively. However, most biological systems have multiple, interacting components, and thus require multidimensional active learning (e.g., choose which pairs of drugs and targets to test in order to model possible effects of multiple drugs on multiple targets). This is far more challenging, but the goal is not just to reduce cost but tackle problems not otherwise addressable. In this talk, I will describe both retrospective and prospective applications of multidimensional active learning to biological systems. Considerations discussed will include the choice of the modeling method, incorporation of prior information using similarity matrices, and how to know when a model is good enough to stop doing experiments.

3:30 pm4:15 pm

Towards the goal of creating more usable and adaptive language interfaces, we consider two extremes of the solution space -- starting from scratch, and naturalizing" a programming language.

The former is inspired by Wittgenstein's language games: a human wishes to accomplish some task (e.g., achieving a certain configuration of blocks), but can only communicate with a computer, who performs the actual actions (e.g., removing all red blocks). The computer initially knows nothing about language and therefore must learn it from scratch through interaction, while the human adapts to the computer's capabilities. We created a game called SHRDLURN in a blocks world, and analyze how 100 people interacted with the game.

On the other extreme, we seed the system with a core programming language and allow users to naturalize'' the core language incrementally by defining alternative syntax and increasingly complex concepts in terms of compositions of simpler ones. In a voxel world, we show that a community of users can simultaneously teach one system a diverse language and use it to build 240 complex voxel structures. Over the course of three days, these builders went from using only the core language to using the full naturalized language in 74.7\% of the last 10K utterances.

### Wednesday, February 15th, 2017

9:30 am10:15 am

It is an irony that often the more severe a person's motor impairment, the more assistance they require and yet the less able they are to operate the very machines created provide this assistance. A primary aim of my lab is to address this confound by incorporating robotics autonomy and intelligence into assistive machines to offload some of the control burden from the user. However, robots which do not adapt to the variable needs of their users when providing physical assistance will struggle to achieve widespread adoption and acceptance. Not only are the physical abilities of the user very non-static and therefore so also is their desired or needed amount of assistance but how the user operates the robot too will change over time. The fact that there is always a human in the loop offers an opportunity: to learn from the human, transforming into a problem of robot learning from human teachers. Which raises a significant question: how will the machine learning algorithm behave when being instructed by teachers who not only are not machine learning or robotics experts, but moreover have motor impairments that influence the learning signals which are provided? This talk will overview a new area beginning to be explored by my research group: that of robot learning from motor-impaired teachers and task partners. Our goal is to transform robotics autonomy in rehabilitation by designing algorithms that treat the constraints imposed by motor-impairments as advantages, rather than limitations.

10:45 am11:30 am

This talk considers a core question in reinforcement learning (RL): How can we tractably solve sequential decision making problems where the learning agent receives rich observations? We begin with a new model called Contextual Decision Processes (CDPs) for studying such problems, and show that it encompasses several prior setups to study RL such as MDPs and POMDPs. Several special cases of CDPs are, however, known to be provably intractable in their sample complexities. To overcome this challenge, we further propose a structural property of such processes, called the Bellman Rank. We find that the Bellman Rank of a CDP (and an associated class of functions) provides an intuitive measure of the hardness of a problem in terms of sample complexity and is small in several practical settings. In particular, we propose an algorithm, whose sample complexity scales with the Bellman Rank of the process, and is completely independent of the size of the observation space of the agent. We also show that our techniques are robust to our modeling assumptions, and make connections to several known results as well as highlight novel consequences of our results. This talk is based on joint work with Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire.

1:30 pm2:15 pm

Our research aims to computationally model mechanisms of human social learning in order to build robots and other machines that are intuitive for people to teach. We take Machine Learning interactions and redesign interfaces and algorithms to support the collection of learning input from end users instead of ML experts. This talk covers results on building models of reciprocal interactions, high-level task goal learning, low-level skill learning, and active learning interactions using anthropomorphic robot platforms.

2:15 pm3:00 pm

No abstract available.

3:30 pm4:15 pm

Natural language parsers are widely used in applications such as question answering, information extraction and machine translation. In this talk, I will describe recent and ongoing work in the UW NLP group for learning CCG parsers that build relatively rich representations of the meaning of input texts. I will cover recent work on interactive approaches for improving both data collection and parsing algorithms. On the data side, we have introduced methods for gathering semantic supervision from non-expert annotators at a very large scale and using such supervision interactively, to label as little data as possible while still learning high quality models. For inference, we introduce new neural A* parsing algorithms that achieve state-of-the-art runtimes and accuracies, while also providing formal guarantees of optimality in inference, by learning to interactively focus on promising parts of the parsing search space. Finally, I will sketch some of our future directions where we aim to extend these ideas to build parsers that work well for any domain and any language, with as little engineering effort as possible.

### Thursday, February 16th, 2017

9:30 am10:15 am

Most modern datasets are plagued with missing data or limited sample sizes. However, in many applications, we have control over the data sampling process such as which drug-gene interactions to record, which network routes to probe, which movies to rate, etc. Thus, we can ask the question ? what does the freedom to actively sample data in a feedback-driven manner buy us? Active learning tries to answer this question for supervised learning. In this talk, I will present work by my group on active sampling methods for several unsupervised learning problems such as matrix and tensor completion/approximation, column subset selection, learning structure of graphical models, reconstructing graph-structured signals, and clustering, as time permits. I will quantify the precise reduction in the amount of data needed to achieve a desired statistical error, as well as demonstrate that active sampling often also enables us to handle a larger class of models such as matrices with coherent row or column space, graphs with heterogeneous degree distributions, and clusters at finer resolutions, when compared to passive (non-feedback driven) sampling.

10:45 am11:30 am

Many clustering problems in computer vision and other contexts are also classification problems, where each cluster shares a meaningful label. Subspace clustering algorithms in particular are often applied to problems that fit this description, for example with face images or handwritten digits. While it is straightforward to request human input on these datasets, our goal is to reduce this input as much as possible. We present an algorithm for active query selection that allows us to leverage the union of subspace structure assumed in subspace clustering. The central step of the algorithm is in querying points of minimum margin between estimated subspaces; analogous to classifier margin, these lie near the decision boundary. This procedure can be used after any subspace clustering algorithm that outputs an affinity matrix and is capable of driving the clustering error down more quickly than other state-of-the-art active query algorithms on datasets with subspace structure. We demonstrate the effectiveness of our algorithm on several benchmark datasets, and with a modest number of queries we see significant gains in clustering performance.

1:30 pm1:45 pm

No abstract available.

1:45 pm2:00 pm

No abstract available.

2:05 pm2:20 pm
Spectral methods provide remarkable theoretical and practical guarantees for the learning of latent variable models (LVM). In contrast to traditional likelihood based methods, spectral methods provide computationally tractable and space efficient algorithms which are accompanied by strong convergence guarantees. In this talk, we show how to deploy the power of spectral methods in Reinforcement Learning (RL) problems and derive efficient learning-planning algorithms in RL settings. In general RL applications, the assumption of full observability of all statistics fails, and we show how partially observable models (e.g. POMDPs, CMDPs) are able to provide suitable frameworks to model the environment by adding extra latent structures to generative models. In general, many challenges arise designing RL algorithms for partially observable models. We address those challenges and state how to utilize spectral methods to arrange efficient RL algorithms.

2:20 pm2:35 pm

No abstract available.

2:40 pm3:00 pm

No abstract available.

3:30 pm3:45 pm

No abstract available.

3:45 pm4:00 pm

No abstract available.

4:05 pm4:20 pm

No abstract available.

4:20 pm4:35 pm

No abstract available.

4:40 pm5:00 pm

No abstract available.

### Friday, February 17th, 2017

9:30 am10:15 am

Successful collaboration on shared-workspace tasks requires that team members (both robot and human) have a similar model of the task.  For systems that learn tasks through common learning-from-demonstration methods, this requirement adds a substantial set of constraints on learning.  In this talk, I will present work on building a shared hierarchical model of the task that uses a simple graph transformation that allows an interactive robot to automatically construct a hierarchical task model from small numbers of linear demonstrations.  This talk will be based heavily on the thesis work of Brad Hayes in my lab.

10:15 am11:00 am

We propose a pool-based non-parametric active learning algorithm for general metric spaces, which outputs a nearest-neighbor classifier. We give prediction error guarantees that depend on the noisy-margin properties of the input sample, and are competitive with those obtained by previously proposed passive learners. We prove that the label complexity of  the new algorithm is significantly lower than that of any passive learner with similar error guarantees. Our algorithm is based on a generalized sample compression scheme, and a new label-efficient active model-selection procedure.

Based on joint work with Aryeh Kontorovich and Ruth Urner

11:30 am12:30 pm

No abstract available.

2:00 pm2:45 pm
Speaker: Steve Hanneke

In this talk, I will present a new general active learning strategy, yielding significantly better label complexity guarantees than previously known to be possible. In particular, this work reveals that under Tsybakov's noise condition, the minimax optimal label complexity of active learning is significantly smaller than that of traditional (passive) supervised learning, for any hypothesis class of finite VC dimension. This new active learning strategy essentially uses sequential hypothesis tests, modified with an early cut-off stopping criterion, to enhance the noise-robustness of a simpler active learning subroutine. The method is surprisingly simple, and quite general, and also yields near-optimal label complexities in several other settings, such as learning with a smooth regression function or a smooth decision boundary.

2:45 pm3:30 pm
We define an interactive setting for learning mixtures of submodular functions for the purpose of submodular summarization.  Many previous methods to learn such mixtures require training sets that have been acquired offline. In our approach, the learner chooses the summaries and only minimal feedback is collected from (often human) evaluators, thereby reducing the arduous task of collecting evaluator-produced summaries.  Our algorithmic framework involves, in each round, solving a difference of submodular (DS) optimization problem. We show that this framework can be slightly modified to handle variants more akin to regret minimization. We empirically demonstrate the efficacy on both synthetic data and for real-world applications, including speech data subset selection and image collection summarization. Improvements for these tasks are achieved by our method over a variety of baselines.

Joint work with Kai Wai, Wenruo Bai, and Rishabh Iyer.