Non-Local Operation

Computer VisionIntroduced 2000181 papers

Description

A Non-Local Operation is a component for capturing long-range dependencies with deep neural networks. It is a generalization of the classical non-local mean operation in computer vision. Intuitively a non-local operation computes the response at a position as a weighted sum of the features at all positions in the input feature maps. The set of positions can be in space, time, or spacetime, implying that these operations are applicable for image, sequence, and video problems.

Following the non-local mean operation, a generic non-local operation for deep neural networks is defined as:

$\mathbb{y}\_{i} = \frac{1}{\mathcal{C}\left(\mathbb{x}\right)}\sum\_{\forall{j}}f\left(\mathbb{x}\_{i}, \mathbb{x}\_{j}\right)g\left(\mathbb{x}\_{j}\right)$

Here $i$ is the index of an output position (in space, time, or spacetime) whose response is to be computed and $j$ is the index that enumerates all possible positions. x is the input signal (image, sequence, video; often their features) and $y$ is the output signal of the same size as $x$ . A pairwise function $f$ computes a scalar (representing relationship such as affinity) between $i$ and all $j$ . The unary function $g$ computes a representation of the input signal at the position $j$ . The response is normalized by a factor $C\left(x\right)$ .

The non-local behavior is due to the fact that all positions ( $\forall{j}$ ) are considered in the operation. As a comparison, a convolutional operation sums up the weighted input in a local neighborhood (e.g., $i − 1 \leq j \leq i + 1$ in a 1D case with kernel size 3), and a recurrent operation at time $i$ is often based only on the current and the latest time steps (e.g., $j = i$ or $i − 1$ ).

The non-local operation is also different from a fully-connected (fc) layer. The equation above computes responses based on relationships between different locations, whereas fc uses learned weights. In other words, the relationship between $x\_{j}$ and $x\_{i}$ is not a function of the input data in fc, unlike in nonlocal layers. Furthermore, the formulation in the equation above supports inputs of variable sizes, and maintains the corresponding size in the output. On the contrary, an fc layer requires a fixed-size input/output and loses positional correspondence (e.g., that from $x\_{i}$ to $y\_{i}$ at the position $i$ ).

A non-local operation is a flexible building block and can be easily used together with convolutional/recurrent layers. It can be added into the earlier part of deep neural networks, unlike fc layers that are often used in the end. This allows us to build a richer hierarchy that combines both non-local and local information.

In terms of parameterisation, we usually parameterise $g$ as a linear embedding of the form $g\left(x\_{j}\right) = W\_{g}\mathbb{x}\_{j}$ , where $W\_{g}$ is a weight matrix to be learned. This is implemented as, e.g., 1×1 convolution in space or 1×1×1 convolution in spacetime. For $f$ we use an affinity function, a list of which can be found here.

Papers Using This Method

Robust Lane Detection with Wavelet-Enhanced Context Modeling and Adaptive Sampling2025-03-24 ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks2024-11-06 Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization2024-10-27 Enhancing Tree Type Detection in Forest Fire Risk Assessment: Multi-Stage Approach and Color Encoding with Forest Fire Risk Evaluation Framework for UAV Imagery2024-07-27 A Scalable Quantum Non-local Neural Network for Image Classification2024-07-26 Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models2024-05-24 Deep Learning-Based CSI Feedback for XL-MIMO Systems in the Near-Field Domain2024-05-15 Vision-based Food Nutrition Estimation via RGB-D Fusion Network2023-10-25 Accurate and lightweight dehazing via multi-receptive-field non-local network and novel contrastive regularization2023-09-28 On quantifying and improving realism of images generated with diffusion2023-09-26 Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows2023-09-21 A Strategic Framework for Optimal Decisions in Football 1-vs-1 Shot-Taking Situations: An Integrated Approach of Machine Learning, Theory-Based Modeling, and Game Theory2023-07-27 Pyrus Base: An Open Source Python Framework for the RoboCup 2D Soccer Simulation2023-07-22 Diffusion Models Beat GANs on Image Classification2023-07-17 Diversity is Strength: Mastering Football Full Game with Interactive Reinforcement Learning of Multiple AIs2023-06-28 Rosetta Neurons: Mining the Common Units in a Model Zoo2023-06-15 Toward more accurate and generalizable brain deformation estimators for traumatic brain injury detection with unsupervised domain adaptation2023-06-08 FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator2023-06-07 Action valuation of on- and off-ball soccer players based on multi-agent deep reinforcement learning2023-05-29 Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?2023-05-27

Description

Following the non-local mean operation, a generic non-local operation for deep neural networks is defined as:

$\mathbb{y}\_{i} = \frac{1}{\mathcal{C}\left(\mathbb{x}\right)}\sum\_{\forall{j}}f\left(\mathbb{x}\_{i}, \mathbb{x}\_{j}\right)g\left(\mathbb{x}\_{j}\right)$