A Unified Approach to Interpreting Model Predictions

Scott Lundberg, Su-In Lee

2017-05-22NeurIPS 2017 12Interpretability Techniques for Deep Learning Interpretable Machine Learning Image Attribution Feature Importance

Paper PDF Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code

Abstract

Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

Results

Task	Dataset	Metric	Value	Model
Interpretability Techniques for Deep Learning	CelebA	Insertion AUC score	0.5246	Kernel SHAP
Image Attribution	VGGFace2	Deletion AUC score (ArcFace ResNet-101)	0.2034	Kernel SHAP
Image Attribution	VGGFace2	Insertion AUC score (ArcFace ResNet-101)	0.6132	Kernel SHAP
Image Attribution	CUB-200-2011	Deletion AUC score (ResNet-101)	0.1016	Kernel SHAP
Image Attribution	CUB-200-2011	Insertion AUC score (ResNet-101)	0.6763	Kernel SHAP
Image Attribution	CelebA	Deletion AUC score (ArcFace ResNet-101)	0.1409	Kernel SHAP
Image Attribution	CelebA	Insertion AUC score (ArcFace ResNet-101)	0.5246	Kernel SHAP

A Unified Approach to Interpreting Model Predictions

Abstract

Results

Related Papers

A Unified Approach to Interpreting Model Predictions

Abstract

Results

Related Papers