TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Squared ReLU

Squared ReLU

GeneralIntroduced 200015 papers
Source Paper

Description

Squared ReLU is an activation function used in the Primer architecture in the feedforward block of the Transformer layer. It is simply squared ReLU activations.

The effectiveness of higher order polynomials can also be observed in other effective Transformer nonlinearities, such as GLU variants like ReGLU and point-wise activations like approximate GELU. However, squared ReLU has drastically different asymptotics as x→inf⁡x \rightarrow \infx→inf compared to the most commonly used activation functions: ReLU, GELU and Swish. Squared ReLU does have significant overlap with ReGLU and in fact is equivalent when ReGLU’s UUU and VVV weight matrices are the same and squared ReLU is immediately preceded by a linear transformation with weight matrix UUU. This leads the authors to believe that squared ReLUs capture the benefits of these GLU variants, while being simpler, without additional parameters, and delivering better quality.

Papers Using This Method

SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment2025-05-20A review of DNA restriction-free overlapping sequence cloning techniques for synthetic biology2025-05-06Primer C-VAE: An interpretable deep learning primer design method to detect emerging virus variants2025-03-03Deriving Activation Functions Using Integration2024-11-20Characteristic Performance Study on Solving Oscillator ODEs via Soft-constrained Physics-informed Neural Network with Small Data2024-08-19The curious case of A31P, a topology-switching mutant of the Repressor of Primer protein : A molecular dynamics study of its folding and misfolding2024-04-01The Effects of Political Martyrdom on Election Results: The Assassination of Abe2023-05-29Brainformers: Trading Simplicity for Efficiency2023-05-29Towards NeuroAI: Introducing Neuronal Diversity into Artificial Neural Networks2023-01-23N-Grammer: Augmenting Transformers with latent n-grams2022-07-13Piecewise Linear Neural Networks and Deep Learning2022-06-18Enriching and Characterizing T-Cell Repertoires from 3' Barcoded Single-Cell Whole Transcriptome Amplification Products2022-03-21Searching for Efficient Transformers for Language Modeling2021-12-01N-grammer: Augmenting Transformers with latent n-grams2021-11-16Primer: Searching for Efficient Transformers for Language Modeling2021-09-17