This cluster will convene an interdisciplinary group of scholars to develop firm theoretical and philosophical foundations for addressing some major issues concerning interpretability of machine learning–based models. Program participants include experts in theoretical computer science, machine learning, statistics, causal inference, and fairness and the present community on interpretability. We aim to address the following fundamental questions about how to best use machine learning for real-life tasks:
- What notions of interpretability might render models that are more amenable to monitoring by regulators?
- How can the quality and usefulness of interpretation in a given context (such as a particular human audience and for a particular domain problem) be evaluated both empirically and theoretically?
- For which of the desiderata that interpretability purports to address must we sacrifice predictive accuracy?
- Are there any feasibly measurable properties of neural networks that can yield significant insights into their input-output functionality? More generally, are there sound theoretical principles under which today’s deep learning tools can be leveraged to confer insights beyond their predictive accuracy?
- What role, if any, do various interpretation or explanation techniques have to offer the discourse on algorithmic fairness and discrimination? Are there any inherent trade-offs between notions of interpretability and fairness?
The cluster will address a variety of perspectives on defining and developing tools for achieving these goals in automated decision-making systems.