TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Unified Approach to Interpreting Model Predictions

A Unified Approach to Interpreting Model Predictions

Scott Lundberg, Su-In Lee

2017-05-22NeurIPS 2017 12Interpretability Techniques for Deep LearningInterpretable Machine LearningImage AttributionFeature Importance
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

Results

TaskDatasetMetricValueModel
Interpretability Techniques for Deep LearningCelebAInsertion AUC score0.5246Kernel SHAP
Image AttributionVGGFace2Deletion AUC score (ArcFace ResNet-101)0.2034Kernel SHAP
Image AttributionVGGFace2Insertion AUC score (ArcFace ResNet-101)0.6132Kernel SHAP
Image AttributionCUB-200-2011Deletion AUC score (ResNet-101)0.1016Kernel SHAP
Image AttributionCUB-200-2011Insertion AUC score (ResNet-101)0.6763Kernel SHAP
Image AttributionCelebADeletion AUC score (ArcFace ResNet-101)0.1409Kernel SHAP
Image AttributionCelebAInsertion AUC score (ArcFace ResNet-101)0.5246Kernel SHAP

Related Papers

MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Neural Network-Guided Symbolic Regression for Interpretable Descriptor Discovery in Perovskite Catalysts2025-07-16SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning2025-07-14Feature-Guided Neighbor Selection for Non-Expert Evaluation of Model Predictions2025-07-08Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis2025-06-26Segment Anything in Pathology Images with Natural Language2025-06-26The Most Important Features in Generalized Additive Models Might Be Groups of Features2025-06-24Sampling Matters in Explanations: Towards Trustworthy Attribution Analysis Building Block in Visual Models through Maximizing Explanation Certainty2025-06-24