TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Polynomial, trigonometric, and tropical activations

Polynomial, trigonometric, and tropical activations

Ismail Khalfaoui-Hassani, Stefan Kesselheim

2025-02-03Text GenerationImage ClassificationLanguage Modelling
PaperPDFCode(official)

Abstract

Which functions can be used as activations in deep neural networks? This article explores families of functions based on orthonormal bases, including the Hermite polynomial basis and the Fourier trigonometric basis, as well as a basis resulting from the tropicalization of a polynomial basis. Our study shows that, through simple variance-preserving initialization and without additional clamping mechanisms, these activations can successfully be used to train deep models, such as GPT-2 for next-token prediction on OpenWebText and ConvNeXt for image classification on ImageNet. Our work addresses the issue of exploding and vanishing activations and gradients, particularly prevalent with polynomial activations, and opens the door for improving the efficiency of large-scale learning tasks. Furthermore, our approach provides insight into the structure of neural networks, revealing that networks with polynomial activations can be interpreted as multivariate polynomial mappings. Finally, using Hermite interpolation, we show that our activations can closely approximate classical ones in pre-trained models by matching both the function and its derivative, making them especially useful for fine-tuning tasks. These activations are available in the torchortho library, which can be accessed via: https://github.com/K-H-Ismail/torchortho.

Results

TaskDatasetMetricValueModel
Text GenerationOpenWebTexteval_loss2.91GPT2-Hermite
Language ModellingOpenWebTexteval_loss2.91GPT2-Hermite
Language ModellingOpenWebTexteval_perplexity18.39GPT2-Hermite
Language ModellingOpenWebTexteval_loss2.92GPT2-Tropical
Language ModellingOpenWebTexteval_perplexity18.64GPT2-Tropical
Language ModellingOpenWebTexteval_loss2.93GPT2-Fourier
Language ModellingOpenWebTexteval_perplexity18.72GPT2-Fourier
Language ModellingOpenWebTexteval_loss2.95GPT2-GELU
Language ModellingOpenWebTexteval_perplexity19.24GPT2-GELU
Image ClassificationImageNetTop 1 Accuracy82.34ConvNeXt-T-Hermite
Image ClassificationImageNetTop 5 Accuracy96.03ConvNeXt-T-Hermite

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Making Language Model a Hierarchical Classifier and Generator2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17