Bringing Order to Chaos: Navigating the Disagreement Problem in Explainable ML

Abstract

As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, why these disagreements occur, and how to address these disagreements in a rigorous fashion. However, there is little to no research that provides answers to these critical questions. In this talk, I will present some of our recent research which addresses the aforementioned questions. More specifically, I will discuss i) a novel quantitative framework to formalize the disagreement between state-of-the-art feature attribution based explanation methods (e.g., LIME, SHAP, Gradient based methods). I will also touch upon on how this framework was constructed by leveraging inputs from interviews and user studies with data scientists who utilize explanation methods in their day-to-day work; ii) an online user study to understand how data scientists resolve disagreements in explanations output by the aforementioned methods; iii) a novel function approximation framework to explain why explanation methods often disagree with each other. I will demonstrate that all the key feature attribution based explanation methods are essentially performing local function approximations albeit, with different loss functions and notions of neighborhood. (iv) a set of guiding principles on how to choose explanation methods and resulting explanations when they disagree in real-world settings. I will conclude this talk by presenting a brief overview of an open source framework that we recently developed called Open-XAI which enables researchers and practitioners to seamlessly evaluate and benchmark both existing and new explanation methods based on various characteristics such as faithfulness, stability, and fairness.

Bringing Order to Chaos: Navigating the Disagreement Problem in Explainable ML

Abstract

Video Recording