TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BERT: Pre-training of Deep Bidirectional Transformers for ...

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

2018-10-11NAACL 2019 6Text ClassificationEmotion Recognition in ConversationQuestion AnsweringParaphrase IdentificationStock Market PredictionSentiment AnalysisCoreference ResolutionNatural Language InferenceCommon Sense ReasoningNatural Language UnderstandingMultimodal Intent RecognitionLinear-Probe ClassificationNamed Entity RecognitionType predictionSemantic Textual SimilarityLinguistic AcceptabilityConversational Response SelectionNamed Entity Recognition (NER)Citation Intent ClassificationSentence ClassificationCross-Lingual Natural Language Inference
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).

Results

TaskDatasetMetricValueModel
Stock Market PredictionAstockAccuray59.11Bert Chinese
Stock Market PredictionAstockF1-score58.99Bert Chinese
Stock Market PredictionAstockPrecision59.07Bert Chinese
Stock Market PredictionAstockRecall59.2Bert Chinese
Reading ComprehensionPhotoChatF153.2BERT
Reading ComprehensionPhotoChatPrecision56.1BERT
Reading ComprehensionPhotoChatRecall50.6BERT
Question AnsweringSQuAD1.1 devEM86.2BERT-LARGE (Ensemble+TriviaQA)
Question AnsweringSQuAD1.1 devF192.2BERT-LARGE (Ensemble+TriviaQA)
Question AnsweringSQuAD1.1 devEM84.2BERT-LARGE (Single+TriviaQA)
Question AnsweringSQuAD1.1 devF191.1BERT-LARGE (Single+TriviaQA)
Question AnsweringMRQAAverage F178.5BERT (large)
Question AnsweringMultiTQHits@18.3BERT
Question AnsweringMultiTQHits@1048.2BERT
Question AnsweringCoQAIn-domain82.5BERT Large Augmented (single model)
Question AnsweringCoQAOut-of-domain77.6BERT Large Augmented (single model)
Question AnsweringCoQAOverall81.1BERT Large Augmented (single model)
Question AnsweringCoQAIn-domain79.8BERT-base finetune (single model)
Question AnsweringCoQAOut-of-domain74.1BERT-base finetune (single model)
Question AnsweringCoQAOverall78.1BERT-base finetune (single model)
Question AnsweringMultiRCEM24.1BERT-large(single model)
Question AnsweringMultiRCF170BERT-large(single model)
Question AnsweringPIQAAccuracy66.7BERT-Large 340M
Question AnsweringSQuAD1.1EM87.433BERT (ensemble)
Question AnsweringSQuAD1.1F193.16BERT (ensemble)
Question AnsweringSQuAD1.1EM87.4BERT-LARGE (Ensemble+TriviaQA)
Question AnsweringSQuAD1.1F193.2BERT-LARGE (Ensemble+TriviaQA)
Question AnsweringSQuAD1.1EM85.083BERT (single model)
Question AnsweringSQuAD1.1F191.835BERT (single model)
Question AnsweringSQuAD1.1F191.8BERT-LARGE (Single+TriviaQA)
Common Sense ReasoningSWAGDev86.6BERT-LARGE
Common Sense ReasoningSWAGTest86.3BERT-LARGE
Common Sense ReasoningReCoRDEM54.04BERT-Base (single model)
Common Sense ReasoningReCoRDF156.065BERT-Base (single model)
Natural Language InferenceWNLIAccuracy65.1BERT-large 340M
Natural Language InferenceMultiNLIMatched86.7BERT-LARGE
Natural Language InferenceMultiNLIMismatched85.9BERT-LARGE
Emotion RecognitionCPEDAccuracy of Sentiment48.96BERT_{utt}
Emotion RecognitionCPEDMacro-F1 of Sentiment45.18BERT_{utt}
Semantic Textual SimilarityMRPCF189.3BERT-LARGE
Semantic Textual SimilaritySTS BenchmarkSpearman Correlation0.865BERT-LARGE
Semantic Textual SimilarityQuora Question PairsF172.1BERT-LARGE
Sentiment AnalysisSST-2 Binary classificationAccuracy94.9BERT-LARGE
Program SynthesisManyTypes4TypeScriptAverage Accuracy57.52BERT
Program SynthesisManyTypes4TypeScriptAverage F154.1BERT
Program SynthesisManyTypes4TypeScriptAverage Precision54.18BERT
Program SynthesisManyTypes4TypeScriptAverage Recall54.02BERT
Coreference ResolutionWinograd Schema ChallengeAccuracy62BERT-large 340M
Paraphrase IdentificationQuora Question PairsF172.1BERT-LARGE
Text ClassificationDBpediaError0.64Bidirectional Encoder Representations from Transformers
Type predictionManyTypes4TypeScriptAverage Accuracy57.52BERT
Type predictionManyTypes4TypeScriptAverage F154.1BERT
Type predictionManyTypes4TypeScriptAverage Precision54.18BERT
Type predictionManyTypes4TypeScriptAverage Recall54.02BERT
Natural Language UnderstandingGLUEAverage82.1BERT-LARGE
Natural Language UnderstandingPDP60Accuracy78.3BERT-large 340M
Stock Trend PredictionAstockAccuray59.11Bert Chinese
Stock Trend PredictionAstockF1-score58.99Bert Chinese
Stock Trend PredictionAstockPrecision59.07Bert Chinese
Stock Trend PredictionAstockRecall59.2Bert Chinese
ClassificationDBpediaError0.64Bidirectional Encoder Representations from Transformers
Intent RecognitionPhotoChatF153.2BERT
Intent RecognitionPhotoChatPrecision56.1BERT
Intent RecognitionPhotoChatRecall50.6BERT

Related Papers

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17