TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Gated Linear Unit

Gated Linear Unit

GeneralIntroduced 2000798 papers
Source Paper

Description

A Gated Linear Unit, or GLU computes:

GLU(a,b)=a⊗σ(b)\mathrm{GLU}(a, b) = a \otimes \sigma(b)GLU(a,b)=a⊗σ(b)

It is used in natural language processing architectures, for example the Gated CNN, because here σ(b)\sigma(b)σ(b) is the gate that control what information from aaa is passed up to the following layer. Intuitively, for a language modeling task, the gating mechanism allows selection of words or features that are important for predicting the next word. The GLU also has non-linear capabilities, but has a linear path for the gradient so diminishes the vanishing gradient problem.

Papers Using This Method

LiLM-RDB-SFC: Lightweight Language Model with Relational Database-Guided DRL for Optimized SFC Provisioning2025-07-15Chat-Ghosting: A Comparative Study of Methods for Auto-Completion in Dialog Systems2025-07-08I Know Which LLM Wrote Your Code Last Summer: LLM generated Code Stylometry for Authorship Attribution2025-06-18Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription2025-06-17A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation2025-06-09The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks2025-06-09A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair2025-06-05DuAL-Net: A Hybrid Framework for Alzheimer's Disease Prediction from Whole-Genome Sequencing via Local SNP Windows and Global Annotations2025-05-31Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking2025-05-29ShIOEnv: A CLI Behavior-Capturing Environment Enabling Grammar-Guided Command Synthesis for Dataset Curation2025-05-23Fusion of Foundation and Vision Transformer Model Features for Dermatoscopic Image Classification2025-05-22LogiCase: Effective Test Case Generation from Logical Description in Competitive Programming2025-05-21EEG-to-Text Translation: A Model for Deciphering Human Brain Activity2025-05-20Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context Permutation2025-05-16Multilingual Machine Translation with Quantum Encoder Decoder Attention-based Convolutional Variational Circuits2025-05-14Performance Evaluation of Large Language Models in Bangla Consumer Health Query Summarization2025-05-08GASCADE: Grouped Summarization of Adverse Drug Event for Enhanced Cancer Pharmacovigilance2025-05-07Benchmarking Traditional Machine Learning and Deep Learning Models for Fault Detection in Power Transformers2025-05-07A review of DNA restriction-free overlapping sequence cloning techniques for synthetic biology2025-05-06JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry2025-04-29