TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/BigBird

BigBird

Natural Language ProcessingIntroduced 200016 papers
Source Paper

Description

BigBird is a Transformer with a sparse attention mechanism that reduces the quadratic dependency of self-attention to linear in the number of tokens. BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. In particular, BigBird consists of three main parts:

  • A set of ggg global tokens attending on all parts of the sequence.
  • All tokens attending to a set of www local neighboring tokens.
  • All tokens attending to a set of rrr random tokens.

This leads to a high performing attention mechanism scaling to much longer sequence lengths (8x).

Papers Using This Method

Convolutional vs Large Language Models for Software Log Classification in Edge-Deployable Cellular Network Testing2024-07-04Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls2024-05-15Multi-level Contrastive Learning for Script-based Character Understanding2023-10-20KoBigBird-large: Transformation of Transformer for Korean Language Understanding2023-09-19BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model From Scratch?2022-11-30Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer2022-11-02LittleBird: Efficient Faster & Longer Transformer for Question Answering2022-10-21Factorizing Content and Budget Decisions in Abstractive Summarization of Long Documents2022-05-25ICDBigBird: A Contextual Embedding Model for ICD Code Classification2022-04-21Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences2022-01-27Hierarchical Neural Network Approaches for Long Document Classification2022-01-18Dynamic Token Normalization Improves Vision Transformers2021-12-05Hierarchical Transformer Networks for Long-sequence and Multiple Clinical Documents Classification2021-10-16A Dataset for Answering Time-Sensitive Questions2021-08-13Three-level Hierarchical Transformer Networks for Long-sequence and Multiple Clinical Documents Classification2021-04-17Big Bird: Transformers for Longer Sequences2020-07-28