Interpreting Natural Language Processing Models

Abstract

The predominant approach to building natural language processing systems these days is based on a transfer learning scenario. At the first stage, a large deep neural network is trained on a massive amount of raw texts, for instance with the objective of predicting the next word given a prefix. In the second stage, the large network is fine-tuned to perform a downstream task, such as question answering or sentiment analysis. While this approach leads to good results in practice, the trained network is perceived as a black box: its internal structure is opaque and its decisions are difficult to explain. In this talk, I will describe some of the methods developed recently in the community for analyzing neural network models of human language. Time permitting, I will also provide brief pointers to a few areas of study of potential interest to the workshop participants: grammar induction, unsupervised machine translation, and emergent communication in AI agents.

Attachment

talk.pptx

Interpreting Natural Language Processing Models

Abstract

Attachment

Video Recording