TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Non-Local Operation

Non-Local Operation

Computer VisionIntroduced 2000181 papers
Source Paper

Description

A Non-Local Operation is a component for capturing long-range dependencies with deep neural networks. It is a generalization of the classical non-local mean operation in computer vision. Intuitively a non-local operation computes the response at a position as a weighted sum of the features at all positions in the input feature maps. The set of positions can be in space, time, or spacetime, implying that these operations are applicable for image, sequence, and video problems.

Following the non-local mean operation, a generic non-local operation for deep neural networks is defined as:

y_i=1C(x)∑_∀jf(x_i,x_j)g(x_j)\mathbb{y}\_{i} = \frac{1}{\mathcal{C}\left(\mathbb{x}\right)}\sum\_{\forall{j}}f\left(\mathbb{x}\_{i}, \mathbb{x}\_{j}\right)g\left(\mathbb{x}\_{j}\right)y_i=C(x)1​∑_∀jf(x_i,x_j)g(x_j)

Here iii is the index of an output position (in space, time, or spacetime) whose response is to be computed and jjj is the index that enumerates all possible positions. x is the input signal (image, sequence, video; often their features) and yyy is the output signal of the same size as xxx. A pairwise function fff computes a scalar (representing relationship such as affinity) between iii and all jjj. The unary function ggg computes a representation of the input signal at the position jjj. The response is normalized by a factor C(x)C\left(x\right)C(x).

The non-local behavior is due to the fact that all positions (∀j\forall{j}∀j) are considered in the operation. As a comparison, a convolutional operation sums up the weighted input in a local neighborhood (e.g., i−1≤j≤i+1i − 1 \leq j \leq i + 1i−1≤j≤i+1 in a 1D case with kernel size 3), and a recurrent operation at time iii is often based only on the current and the latest time steps (e.g., j=ij = ij=i or i−1i − 1i−1).

The non-local operation is also different from a fully-connected (fc) layer. The equation above computes responses based on relationships between different locations, whereas fc uses learned weights. In other words, the relationship between x_jx\_{j}x_j and x_ix\_{i}x_i is not a function of the input data in fc, unlike in nonlocal layers. Furthermore, the formulation in the equation above supports inputs of variable sizes, and maintains the corresponding size in the output. On the contrary, an fc layer requires a fixed-size input/output and loses positional correspondence (e.g., that from x_ix\_{i}x_i to y_iy\_{i}y_i at the position iii).

A non-local operation is a flexible building block and can be easily used together with convolutional/recurrent layers. It can be added into the earlier part of deep neural networks, unlike fc layers that are often used in the end. This allows us to build a richer hierarchy that combines both non-local and local information.

In terms of parameterisation, we usually parameterise ggg as a linear embedding of the form g(x_j)=W_gx_jg\left(x\_{j}\right) = W\_{g}\mathbb{x}\_{j}g(x_j)=W_gx_j , where W_gW\_{g}W_g is a weight matrix to be learned. This is implemented as, e.g., 1×1 convolution in space or 1×1×1 convolution in spacetime. For fff we use an affinity function, a list of which can be found here.

Papers Using This Method

Robust Lane Detection with Wavelet-Enhanced Context Modeling and Adaptive Sampling2025-03-24ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks2024-11-06Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization2024-10-27Enhancing Tree Type Detection in Forest Fire Risk Assessment: Multi-Stage Approach and Color Encoding with Forest Fire Risk Evaluation Framework for UAV Imagery2024-07-27A Scalable Quantum Non-local Neural Network for Image Classification2024-07-26Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models2024-05-24Deep Learning-Based CSI Feedback for XL-MIMO Systems in the Near-Field Domain2024-05-15Vision-based Food Nutrition Estimation via RGB-D Fusion Network2023-10-25Accurate and lightweight dehazing via multi-receptive-field non-local network and novel contrastive regularization2023-09-28On quantifying and improving realism of images generated with diffusion2023-09-26Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows2023-09-21A Strategic Framework for Optimal Decisions in Football 1-vs-1 Shot-Taking Situations: An Integrated Approach of Machine Learning, Theory-Based Modeling, and Game Theory2023-07-27Pyrus Base: An Open Source Python Framework for the RoboCup 2D Soccer Simulation2023-07-22Diffusion Models Beat GANs on Image Classification2023-07-17Diversity is Strength: Mastering Football Full Game with Interactive Reinforcement Learning of Multiple AIs2023-06-28Rosetta Neurons: Mining the Common Units in a Model Zoo2023-06-15Toward more accurate and generalizable brain deformation estimators for traumatic brain injury detection with unsupervised domain adaptation2023-06-08FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator2023-06-07Action valuation of on- and off-ball soccer players based on multi-agent deep reinforcement learning2023-05-29Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?2023-05-27