TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-Explaining Structures Improve NLP Models

Self-Explaining Structures Improve NLP Models

Zijun Sun, Chun Fan, Qinghong Han, Xiaofei Sun, Yuxian Meng, Fei Wu, Jiwei Li

2020-12-03Paraphrase IdentificationSentiment AnalysisNatural Language Inference
PaperPDFCode(official)

Abstract

Existing approaches to explaining deep learning models in NLP usually suffer from two major drawbacks: (1) the main model and the explaining model are decoupled: an additional probing or surrogate model is used to interpret an existing model, and thus existing explaining tools are not self-explainable; (2) the probing model is only able to explain a model's predictions by operating on low-level features by computing saliency scores for individual words but are clumsy at high-level text units such as phrases, sentences, or paragraphs. To deal with these two issues, in this paper, we propose a simple yet general and effective self-explaining framework for deep learning models in NLP. The key point of the proposed framework is to put an additional layer, as is called by the interpretation layer, on top of any existing NLP model. This layer aggregates the information for each text span, which is then associated with a specific weight, and their weighted combination is fed to the softmax function for the final prediction. The proposed model comes with the following merits: (1) span weights make the model self-explainable and do not require an additional probing model for interpretation; (2) the proposed model is general and can be adapted to any existing deep learning structures in NLP; (3) the weight associated with each text span provides direct importance scores for higher-level text units such as phrases and sentences. We for the first time show that interpretability does not come at the cost of performance: a neural model of self-explaining features obtains better performances than its counterpart without the self-explaining nature, achieving a new SOTA performance of 59.1 on SST-5 and a new SOTA performance of 92.3 on SNLI.

Results

TaskDatasetMetricValueModel
Natural Language InferenceSNLI% Test Accuracy92.3RoBERTa-large+Self-Explaining
Natural Language InferenceSNLIParameters340RoBERTa-large+Self-Explaining
Natural Language InferenceSNLI% Test Accuracy92.3RoBERTa-large + self-explaining layer
Semantic Textual SimilarityQuora Question PairsAccuracy801-3[0.8pt/2pt] Random
Sentiment AnalysisSST-5 Fine-grained classificationAccuracy59.1RoBERTa-large+Self-Explaining
Paraphrase IdentificationQuora Question PairsAccuracy801-3[0.8pt/2pt] Random

Related Papers

AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles2025-07-15DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning2025-07-14GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification2025-07-08ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation2025-06-27