Cell-aware Stacked LSTMs for Modeling Sentences

Jihun Choi, Taeuk Kim, Sang-goo Lee

2018-09-07Machine Translation Paraphrase Identification Sentiment Analysis Natural Language Inference Translation Sentiment Classification

Paper PDF

Abstract

We propose a method of stacking multiple long short-term memory (LSTM) layers for modeling sentences. In contrast to the conventional stacked LSTMs where only hidden states are fed as input to the next layer, the suggested architecture accepts both hidden and memory cell states of the preceding layer and fuses information from the left and the lower context using the soft gating mechanism of LSTMs. Thus the architecture modulates the amount of information to be delivered not only in horizontal recurrence but also in vertical connections, from which useful features extracted from lower layers are effectively conveyed to upper layers. We dub this architecture Cell-aware Stacked LSTM (CAS-LSTM) and show from experiments that our models bring significant performance gain over the standard LSTMs on benchmark datasets for natural language inference, paraphrase detection, sentiment classification, and machine translation. We also conduct extensive qualitative analysis to understand the internal behavior of the suggested approach.

Results

Task	Dataset	Metric	Value	Model
Natural Language Inference	SNLI	% Test Accuracy	87	300D 2-layer Bi-CAS-LSTM
Semantic Textual Similarity	Quora Question Pairs	Accuracy	88.6	Bi-CAS-LSTM
Sentiment Analysis	SST-5 Fine-grained classification	Accuracy	53.6	Bi-CAS-LSTM
Sentiment Analysis	SST-2 Binary classification	Accuracy	91.3	Bi-CAS-LSTM
Paraphrase Identification	Quora Question Pairs	Accuracy	88.6	Bi-CAS-LSTM

Related Papers

AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17 A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17 AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles2025-07-15 DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15 LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15 Function-to-Style Guidance of LLMs for Code Translation2025-07-15 SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning2025-07-14 GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10