TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

Zhiheng Huang, Peng Xu, Davis Liang, Ajay Mishra, Bing Xiang

2020-03-16Text Classification Machine Translation Question Answering Paraphrase Identification Natural Language Inference Translation Sentence Classification

Paper PDF

Abstract

Bidirectional Encoder Representations from Transformers (BERT) has recently achieved state-of-the-art performance on a broad range of NLP tasks including sentence classification, machine translation, and question answering. The BERT model architecture is derived primarily from the transformer. Prior to the transformer era, bidirectional Long Short-Term Memory (BLSTM) has been the dominant modeling architecture for neural machine translation and question answering. In this paper, we investigate how these two modeling techniques can be combined to create a more powerful model architecture. We propose a new architecture denoted as Transformer with BLSTM (TRANS-BLSTM) which has a BLSTM layer integrated to each transformer block, leading to a joint modeling framework for transformer and BLSTM. We show that TRANS-BLSTM models consistently lead to improvements in accuracy compared to BERT baselines in GLUE and SQuAD 1.1 experiments. Our TRANS-BLSTM model obtains an F1 score of 94.01% on the SQuAD 1.1 development dataset, which is comparable to the state-of-the-art result.

Results

Task	Dataset	Metric	Value	Model
Semantic Textual Similarity	Quora Question Pairs	Accuracy	88.28	TRANS-BLSTM
Paraphrase Identification	Quora Question Pairs	Accuracy	88.28	TRANS-BLSTM
Text Classification	GLUE SST2	Accuracy	94.38	TRANS-BLSTM
Text Classification	GLUE RTE	Accuracy	79.78	TRANS-BLSTM
Text Classification	GLUE MRPC	Accuracy	90.45	TRANS-BLSTM
Classification	GLUE SST2	Accuracy	94.38	TRANS-BLSTM
Classification	GLUE RTE	Accuracy	79.78	TRANS-BLSTM
Classification	GLUE MRPC	Accuracy	90.45	TRANS-BLSTM

TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

Abstract

Results

Related Papers

TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

Abstract

Results

Related Papers