TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Not all layers are equally as important: Every Layer Count...

Not all layers are equally as important: Every Layer Counts BERT

Lucas Georges Gabriel Charpentier, David Samuel

2023-11-03Natural Language InferenceLinguistic AcceptabilityAll
PaperPDF

Abstract

This paper introduces a novel modification of the transformer architecture, tailored for the data-efficient pretraining of language models. This aspect is evaluated by participating in the BabyLM challenge, where our solution won both the strict and strict-small tracks. Our approach allows each transformer layer to select which outputs of previous layers to process. The empirical results verify the potential of this simple modification and show that not all layers are equally as important.

Results

TaskDatasetMetricValueModel
Natural Language InferenceRTEAccuracy63ELC-BERT-base 98M (zero init)
Natural Language InferenceRTEAccuracy55.4ELC-BERT-small 24M
Natural Language InferenceRTEAccuracy54.7LTG-BERT-base 98M
Natural Language InferenceRTEAccuracy53.7LTG-BERT-small 24M
Natural Language InferenceMultiNLIMatched84.4ELC-BERT-base 98M (zero init)
Natural Language InferenceMultiNLIMismatched84.5ELC-BERT-base 98M (zero init)
Natural Language InferenceMultiNLIMatched83LTG-BERT-base 98M
Natural Language InferenceMultiNLIMismatched83.4LTG-BERT-base 98M
Natural Language InferenceMultiNLIMatched79.2ELC-BERT-small 24M
Natural Language InferenceMultiNLIMismatched79.9ELC-BERT-small 24M
Natural Language InferenceMultiNLIMatched78LTG-BERT-small 24M
Natural Language InferenceMultiNLIMismatched78.8LTG-BERT-small 24M
Linguistic AcceptabilityCoLAAccuracy82.7LTG-BERT-base 98M
Linguistic AcceptabilityCoLAAccuracy82.6ELC-BERT-base 98M
Linguistic AcceptabilityCoLAAccuracy77.6LTG-BERT-small 24M
Linguistic AcceptabilityCoLAAccuracy76.1ELC-BERT-small 24M

Related Papers

LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15Modeling Code: Is Text All You Need?2025-07-15All Eyes, no IMU: Learning Flight Attitude from Vision Alone2025-07-15DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification2025-07-08Is Diversity All You Need for Scalable Robotic Manipulation?2025-07-08DESIGN AND IMPLEMENTATION OF ONLINE CLEARANCE REPORT.2025-07-07Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models2025-07-03Prompt2SegCXR:Prompt to Segment All Organs and Diseases in Chest X-rays2025-07-01