TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Axiomatic Attribution for Deep Networks

Axiomatic Attribution for Deep Networks

Mukund Sundararajan, Ankur Taly, Qiqi Yan

2017-03-04ICML 2017 8Interpretability Techniques for Deep LearningExplainable artificial intelligenceInterpretable Machine LearningImage Attribution
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

Results

TaskDatasetMetricValueModel
Interpretability Techniques for Deep LearningCelebAInsertion AUC score0.3578Integrated Gradients
Image AttributionVGGFace2Deletion AUC score (ArcFace ResNet-101)0.0749Integrated Gradients
Image AttributionVGGFace2Insertion AUC score (ArcFace ResNet-101)0.5399Integrated Gradients
Image AttributionCUB-200-2011Deletion AUC score (ResNet-101)0.0728Integrated Gradients
Image AttributionCUB-200-2011Insertion AUC score (ResNet-101)0.0422Integrated Gradients
Image AttributionCelebADeletion AUC score (ArcFace ResNet-101)0.068Integrated Gradients
Image AttributionCelebAInsertion AUC score (ArcFace ResNet-101)0.3578Integrated Gradients

Related Papers

Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey2025-07-09From Motion to Meaning: Biomechanics-Informed Neural Network for Explainable Cardiovascular Disease Identification2025-07-08IXAII: An Interactive Explainable Artificial Intelligence Interface for Decision Support Systems2025-06-26Towards Transparent AI: A Survey on Explainable Large Language Models2025-06-26Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis2025-06-26Towards Interpretable and Efficient Feature Selection in Trajectory Datasets: A Taxonomic Approach2025-06-25Communicating Smartly in the Molecular Domain: Neural Networks in the Internet of Bio-Nano Things2025-06-25Toward the Explainability of Protein Language Models for Sequence Design2025-06-24