Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Spatial Transformer

Spatial Transformer

Computer VisionIntroduced 2000169 papers

Description

A Spatial Transformer is an image model block that explicitly allows the spatial manipulation of data within a convolutional neural network. It gives CNNs the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. Unlike pooling layers, where the receptive fields are fixed and local, the spatial transformer module is a dynamic mechanism that can actively spatially transform an image (or a feature map) by producing an appropriate transformation for each input sample. The transformation is then performed on the entire feature map (non-locally) and can include scaling, cropping, rotations, as well as non-rigid deformations.

The architecture is shown in the Figure to the right. The input feature map $U$ is passed to a localisation network which regresses the transformation parameters $\theta$ . The regular spatial grid $G$ over $V$ is transformed to the sampling grid $T\_{\theta}\left(G\right)$ , which is applied to $U$ , producing the warped output feature map $V$ . The combination of the localisation network and sampling mechanism defines a spatial transformer.

Papers Using This Method

FOAM: A General Frequency-Optimized Anti-Overlapping Framework for Overlapping Object Perception2025-06-16 GuidedMorph: Two-Stage Deformable Registration for Breast MRI2025-05-19 EmoNeXt: an Adapted ConvNeXt for Facial Emotion Recognition2025-01-14 Neural encoding with affine feature response transforms2025-01-07 A novel deep learning approach for facial emotion recognition: application to detecting emotional responses in elderly individuals with Alzheimer’s disease2024-12-30 Fixing the Perspective: A Critical Examination of Zero-1-to-32024-11-24 ESC-MISR: Enhancing Spatial Correlations for Multi-Image Super-Resolution in Remote Sensing2024-11-07 Spatial Transformers for Radio Map Estimation2024-11-02 Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction2024-10-24 Disambiguating Monocular Reconstruction of 3D Clothed Human with Spatial-Temporal Transformer2024-10-21 MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping2024-09-17 Automatic facial axes standardization of 3D fetal ultrasound images2024-09-04 Improved 3D Whole Heart Geometry from Sparse CMR Slices2024-08-14 Spatial Transformer Network YOLO Model for Agricultural Object Detection2024-07-31 Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning2024-07-22 X-Recon: Learning-based Patient-specific High-Resolution CT Reconstruction from Orthogonal X-Ray Images2024-07-22 Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction2024-06-24 Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark Detection2024-05-30 Vision-Language Modeling with Regularized Spatial Transformer Networks for All Weather Crosswind Landing of Aircraft2024-05-09 Efficient and Scalable Chinese Vector Font Generation via Component Composition2024-04-10