TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Transformer

Transformer

Natural Language ProcessingIntroduced 200014004 papers
Source Paper

Description

A Transformer is a model architecture that eschews recurrence and instead relies entirely on an attention mechanism to draw global dependencies between input and output. Before Transformers, the dominant sequence transduction models were based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The Transformer also employs an encoder and decoder, but removing recurrence in favor of attention mechanisms allows for significantly more parallelization than methods like RNNs and CNNs.

Papers Using This Method

DASViT: Differentiable Architecture Search for Vision Transformer2025-07-17Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Langevin Flows for Modeling Neural Latent Dynamics2025-07-15Biological Processing Units: Leveraging an Insect Connectome to Pioneer Biofidelic Neural Architectures2025-07-15KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding2025-07-15Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15Token Compression Meets Compact Vision Transformers: A Survey and Comparative Evaluation for Edge AI2025-07-13Learning from Synthetic Labs: Language Models as Auction Participants2025-07-12Comparative Analysis of Vision Transformers and Traditional Deep Learning Approaches for Automated Pneumonia Detection in Chest X-Rays2025-07-11Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving2025-07-08Geo-Registration of Terrestrial LiDAR Point Clouds with Satellite Images without GNSS2025-07-08Tile-Based ViT Inference with Visual-Cluster Priors for Zero-Shot Multi-Species Plant Identification2025-07-08A Wireless Foundation Model for Multi-Task Prediction2025-07-08Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate2025-07-08SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model2025-07-07Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations2025-07-07Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning2025-07-07AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models2025-07-07Fast and Simplex: 2-Simplicial Attention in Triton2025-07-03