Representation Learning of Grounded Language and Knowledge: with and without End-to-End Learning

Abstract

Representation learning through an end-to-end learning framework has shown increasingly strong results across many applications in NLP and computer vision, gradually removing the reliance on human engineered features. However, end-to-end learning is feasible only if high quality training data is available at scale, learning from scratch for each task is rather wasteful, and learned representation tends to focus on task-specific or dataset-specific patterns, rather than producing interpretable and generalizable knowledge about the world.

In contrast, humans learn a great deal about the world with and without end-to-end learning, and it is this rich background knowledge about the world that enables humans to navigate through complex unstructured environments and learn new tasks efficiently from only a handful of examples.

In this talk, I will present our recent efforts that investigate the feasibility of acquiring and representing trivial everyday knowledge about the world. In the first part, I will present our work that focuses on procedural language and knowledge in the cooking recipe domain, where procedural knowledge, e.g., ``how to bake blueberry muffins’’, is implicit in the learned neural representation. In the second part, I will present a complementary approach that attempts to reverse engineer even such knowledge not explicit in language due to reporting bias — people rarely state the obvious, e.g., ``my house is bigger than me’’ — by jointly reasoning about multiple related types of knowledge about actions and objects in the world.

Representation Learning of Grounded Language and Knowledge: with and without End-to-End Learning

Abstract

Video Recording