TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Spatial Transformer

Spatial Transformer

Computer VisionIntroduced 2000169 papers
Source Paper

Description

A Spatial Transformer is an image model block that explicitly allows the spatial manipulation of data within a convolutional neural network. It gives CNNs the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. Unlike pooling layers, where the receptive fields are fixed and local, the spatial transformer module is a dynamic mechanism that can actively spatially transform an image (or a feature map) by producing an appropriate transformation for each input sample. The transformation is then performed on the entire feature map (non-locally) and can include scaling, cropping, rotations, as well as non-rigid deformations.

The architecture is shown in the Figure to the right. The input feature map UUU is passed to a localisation network which regresses the transformation parameters θ\thetaθ. The regular spatial grid GGG over VVV is transformed to the sampling grid T_θ(G)T\_{\theta}\left(G\right)T_θ(G), which is applied to UUU, producing the warped output feature map VVV. The combination of the localisation network and sampling mechanism defines a spatial transformer.

Papers Using This Method

FOAM: A General Frequency-Optimized Anti-Overlapping Framework for Overlapping Object Perception2025-06-16GuidedMorph: Two-Stage Deformable Registration for Breast MRI2025-05-19EmoNeXt: an Adapted ConvNeXt for Facial Emotion Recognition2025-01-14Neural encoding with affine feature response transforms2025-01-07A novel deep learning approach for facial emotion recognition: application to detecting emotional responses in elderly individuals with Alzheimer’s disease2024-12-30Fixing the Perspective: A Critical Examination of Zero-1-to-32024-11-24ESC-MISR: Enhancing Spatial Correlations for Multi-Image Super-Resolution in Remote Sensing2024-11-07Spatial Transformers for Radio Map Estimation2024-11-02Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction2024-10-24Disambiguating Monocular Reconstruction of 3D Clothed Human with Spatial-Temporal Transformer2024-10-21MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping2024-09-17Automatic facial axes standardization of 3D fetal ultrasound images2024-09-04Improved 3D Whole Heart Geometry from Sparse CMR Slices2024-08-14Spatial Transformer Network YOLO Model for Agricultural Object Detection2024-07-31Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning2024-07-22X-Recon: Learning-based Patient-specific High-Resolution CT Reconstruction from Orthogonal X-Ray Images2024-07-22Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction2024-06-24Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark Detection2024-05-30Vision-Language Modeling with Regularized Spatial Transformer Networks for All Weather Crosswind Landing of Aircraft2024-05-09Efficient and Scalable Chinese Vector Font Generation via Component Composition2024-04-10