Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/SwiGLU

SwiGLU

GeneralIntroduced 200013 papers

Description

SwiGLU is an activation function which is a variant of GLU. The definition is as follows:

$\text{SwiGLU}\left(x, W, V, b, c, \beta\right) = \text{Swish}\_{\beta}\left(xW + b\right) \otimes \left(xV + c\right)$

Papers Using This Method

TokenFLEX: Unified VLM Training for Flexible Visual Tokens Inference2025-04-04 DiffFormer: a Differential Spatial-Spectral Transformer for Hyperspectral Image Classification2024-12-23 Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking2024-12-02 Deriving Activation Functions Using Integration2024-11-20 Scaling FP8 training to trillion-token LLMs2024-09-19 How Lightweight Can A Vision Transformer Be2024-07-25 Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters2024-06-10 ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs2024-02-06 BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model2023-09-20 SlimPajama-DC: Understanding Data Combinations for LLM Training2023-09-19 Llama 2: Open Foundation and Fine-Tuned Chat Models2023-07-18 PaLM: Scaling Language Modeling with Pathways2022-04-05 GLU Variants Improve Transformer2020-02-12