Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/BigBird

BigBird

Natural Language ProcessingIntroduced 200016 papers

Description

BigBird is a Transformer with a sparse attention mechanism that reduces the quadratic dependency of self-attention to linear in the number of tokens. BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. In particular, BigBird consists of three main parts:

A set of $g$ global tokens attending on all parts of the sequence.
All tokens attending to a set of $w$ local neighboring tokens.
All tokens attending to a set of $r$ random tokens.

This leads to a high performing attention mechanism scaling to much longer sequence lengths (8x).

Papers Using This Method

Convolutional vs Large Language Models for Software Log Classification in Edge-Deployable Cellular Network Testing2024-07-04 Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls2024-05-15 Multi-level Contrastive Learning for Script-based Character Understanding2023-10-20 KoBigBird-large: Transformation of Transformer for Korean Language Understanding2023-09-19 BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model From Scratch?2022-11-30 Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer2022-11-02 LittleBird: Efficient Faster & Longer Transformer for Question Answering2022-10-21 Factorizing Content and Budget Decisions in Abstractive Summarization of Long Documents2022-05-25 ICDBigBird: A Contextual Embedding Model for ICD Code Classification2022-04-21 Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences2022-01-27 Hierarchical Neural Network Approaches for Long Document Classification2022-01-18 Dynamic Token Normalization Improves Vision Transformers2021-12-05 Hierarchical Transformer Networks for Long-sequence and Multiple Clinical Documents Classification2021-10-16 A Dataset for Answering Time-Sensitive Questions2021-08-13 Three-level Hierarchical Transformer Networks for Long-sequence and Multiple Clinical Documents Classification2021-04-17 Big Bird: Transformers for Longer Sequences2020-07-28