Abstract

Synthesizing "explainable" interpretations of black-box ML models is often crucial for reasoning about their behaviour. Decision trees and decision diagrams have typically been considered good for explanations, and parameters like depth, count of decision nodes, preference of decision predicates etc. have been used to quantify their "explainability" in different contexts. Almost inevitably, there is a tension between such explainability metrics and the accuracy of the interpretation. Simpler models score high on explainability, but may not faithfully mimic the behaviour of the model; similarly, more complex (and less explainable) models typically do a better job of mimicing the behaviour of a black box model. In this talk, we will discuss how Pareto-optimal interpretations of such black-box models can be systematically synthesized for a large class of explainability metrics using an optimized search based on MaxSAT solving. We provide PAC-style guarantees on the synthesized interpretations, and apply this technique to synthesize Pareto-optimal interpretations of some benchmarks. Our results show that there are often several Pareto-optimal interpretations that are easy to miss if one optimizes a single scalar objective function that combines accuracy and explainability metrics. 

This is joint work with Hazem Torfah, Shetal Shah, Sanjit Seshia and S. Akshay

Video Recording