Description
Routed Attention is an attention pattern proposed as part of the Routing Transformer architecture. Each attention module considers a clustering of the space: the current timestep only attends to context belonging to the same cluster. In other word, the current time-step query is routed to a limited number of context through its cluster assignment. This can be contrasted with strided attention patterns and those proposed with the Sparse Transformer.
In the image to the right, the rows represent the outputs while the columns represent the inputs. The different colors represent cluster memberships for the output token.
Papers Using This Method
Lightweight Relational Embedding in Task-Interpolated Few-Shot Networks for Enhanced Gastrointestinal Disease Classification2025-05-30CL-MFAP: A Contrastive Learning-Based Multimodal Foundation Model for Molecular Property Prediction and Antibiotic Screening2025-02-16HYATT-Net is Grand: A Hybrid Attention Network for Performant Anatomical Landmark Detection2024-12-09Pubic Symphysis-Fetal Head Segmentation Network Using BiFormer Attention Mechanism and Multipath Dilated Convolution2024-10-14DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention2024-10-11Vision Transformer with Key-select Routing Attention for Single Image Dehazing2024-06-28BRAU-Net++: U-Shaped Hybrid CNN-Transformer Network for Medical Image Segmentation2024-01-01Pubic Symphysis-Fetal Head Segmentation Using Pure Transformer with Bi-level Routing Attention2023-09-30BGF-YOLO: Enhanced YOLOv8 with Multiscale Attentional Feature Fusion for Brain Tumor Detection2023-09-22Student Classroom Behavior Detection based on YOLOv7-BRA and Multi-Model Fusion2023-05-13Hybrid Routing Transformer for Zero-Shot Learning2022-03-29Hurdles to Progress in Long-form Question Answering2021-03-10Efficient Content-Based Sparse Attention with Routing Transformers2020-03-12