Observational learning, or learning how to solve tasks by observing others perform them, is a key component of human development. Notably, we are capable of learning behaviors through only observations of state trajectories without direct access to the underlying actions (e.g., the exact kinematic forces, torques on joints, etc.) or intentions that yielded them. In order to be general, artificial agents should also be equipped with the ability to quickly solve problems after observing the solution. In this talk, I will discuss two techniques in depth for enabling observational learning in agents. First, I will describe an approach that trains a latent policy directly from state observations, which can then be quickly mapped to real actions in the agent’s environment. Then I will describe how we can train a novel value function, Q(s,s’), to learn off-policy from observations. Unlike previous imitation from observation approaches, this formulation goes beyond simply imitating and rather enables learning from potentially suboptimal observations.

Video Recording