Julius Adebayo (MIT)
As deep learning models are been used in different tasks and settings, there has been increased interest in ways to `interpret', 'debug', and 'understand' these models. Consequently, there has now been a wave of post-hoc, sensitivity-based, methods for interpreting DNNs. These methods typically provide a local 'explanation' around single input examples. With a wave of several proposed methods, it is currently difficult for a practitioner to select a method for use.
In this talk, we will look at potential benefits and limitations of the local explanations paradigm. First, we will consider ways to assess these interpretation methods. In particular, we will try to get at whether these methods can help debug models, i.e., help identify model mistakes prior to deployment. In addition, we will consider privacy trade-offs. Recent work has shown that it is easy to recover a model with modest access to local explanations for a few data points; hence, raising privacy concerns. We will look at recent results in this line of work, and end with some interesting research directions.