Axiomatic Attribution for Deep Networks

Mukund Sundararajan, Ankur Taly, Qiqi Yan

2017-03-04ICML 2017 8Interpretability Techniques for Deep Learning Explainable artificial intelligence Interpretable Machine Learning Image Attribution

Paper PDF Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code(official)Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code

Abstract

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

Results

Task	Dataset	Metric	Value	Model
Interpretability Techniques for Deep Learning	CelebA	Insertion AUC score	0.3578	Integrated Gradients
Image Attribution	VGGFace2	Deletion AUC score (ArcFace ResNet-101)	0.0749	Integrated Gradients
Image Attribution	VGGFace2	Insertion AUC score (ArcFace ResNet-101)	0.5399	Integrated Gradients
Image Attribution	CUB-200-2011	Deletion AUC score (ResNet-101)	0.0728	Integrated Gradients
Image Attribution	CUB-200-2011	Insertion AUC score (ResNet-101)	0.0422	Integrated Gradients
Image Attribution	CelebA	Deletion AUC score (ArcFace ResNet-101)	0.068	Integrated Gradients
Image Attribution	CelebA	Insertion AUC score (ArcFace ResNet-101)	0.3578	Integrated Gradients

Related Papers

Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey2025-07-09 From Motion to Meaning: Biomechanics-Informed Neural Network for Explainable Cardiovascular Disease Identification2025-07-08 IXAII: An Interactive Explainable Artificial Intelligence Interface for Decision Support Systems2025-06-26 Towards Transparent AI: A Survey on Explainable Large Language Models2025-06-26 Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis2025-06-26 Towards Interpretable and Efficient Feature Selection in Trajectory Datasets: A Taxonomic Approach2025-06-25 Communicating Smartly in the Molecular Domain: Neural Networks in the Internet of Bio-Nano Things2025-06-25 Toward the Explainability of Protein Language Models for Sequence Design2025-06-24