TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Non-local Neural Networks

Non-local Neural Networks

Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He

2017-11-21CVPR 2018 6Text-To-SQLAction ClassificationPose EstimationKeypoint DetectionInstance SegmentationVideo ClassificationAction RecognitionObject Detection
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our non-local models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code is available at https://github.com/facebookresearch/video-nonlocal-net .

Results

TaskDatasetMetricValueModel
VideoToyota Smarthome datasetCS53.6I3D + Non Local
VideoToyota Smarthome datasetCV134.3I3D + Non Local
VideoToyota Smarthome datasetCV243.9I3D + Non Local
VideoKinetics-400Acc@177.7I3D + NL
VideoKinetics-400Acc@593.3I3D + NL
Activity RecognitionSomething-Something V1Top 1 Accuracy44.4NL I3D
Pose EstimationCOCO (Common Objects in Context)Validation AP66.5Mask R-CNN + NL blocks (4 in head, 1 in backbone)
Object DetectionCOCO minivalAP5067.8Mask R-CNN (ResNeXt-152 + 1 NL)
Object DetectionCOCO minivalAP7548.9Mask R-CNN (ResNeXt-152 + 1 NL)
Object DetectionCOCO minivalbox AP45Mask R-CNN (ResNeXt-152 + 1 NL)
Object DetectionCOCO minivalAP5063.1Mask R-CNN (ResNet-101 + 1 NL)
Object DetectionCOCO minivalAP7544.5Mask R-CNN (ResNet-101 + 1 NL)
Object DetectionCOCO minivalbox AP40.8Mask R-CNN (ResNet-101 + 1 NL)
Object DetectionCOCO minivalAP5061.1Mask R-CNN (ResNet-50 + 1 NL)
Object DetectionCOCO minivalAP7541.9Mask R-CNN (ResNet-50 + 1 NL)
Object DetectionCOCO minivalbox AP39Mask R-CNN (ResNet-50 + 1 NL)
3DCOCO minivalAP5067.8Mask R-CNN (ResNeXt-152 + 1 NL)
3DCOCO minivalAP7548.9Mask R-CNN (ResNeXt-152 + 1 NL)
3DCOCO minivalbox AP45Mask R-CNN (ResNeXt-152 + 1 NL)
3DCOCO minivalAP5063.1Mask R-CNN (ResNet-101 + 1 NL)
3DCOCO minivalAP7544.5Mask R-CNN (ResNet-101 + 1 NL)
3DCOCO minivalbox AP40.8Mask R-CNN (ResNet-101 + 1 NL)
3DCOCO minivalAP5061.1Mask R-CNN (ResNet-50 + 1 NL)
3DCOCO minivalAP7541.9Mask R-CNN (ResNet-50 + 1 NL)
3DCOCO minivalbox AP39Mask R-CNN (ResNet-50 + 1 NL)
3DCOCO (Common Objects in Context)Validation AP66.5Mask R-CNN + NL blocks (4 in head, 1 in backbone)
Instance SegmentationCOCO minivalmask AP40.3Mask R-CNN (ResNext-152, +1 NL)
Instance SegmentationCOCO minivalmask AP37.1Mask R-CNN (ResNet-101, +1 NL)
Instance SegmentationCOCO minivalmask AP35.5Mask R-CNN (ResNet-50, +1 NL)
Action RecognitionSomething-Something V1Top 1 Accuracy44.4NL I3D
2D ClassificationCOCO minivalAP5067.8Mask R-CNN (ResNeXt-152 + 1 NL)
2D ClassificationCOCO minivalAP7548.9Mask R-CNN (ResNeXt-152 + 1 NL)
2D ClassificationCOCO minivalbox AP45Mask R-CNN (ResNeXt-152 + 1 NL)
2D ClassificationCOCO minivalAP5063.1Mask R-CNN (ResNet-101 + 1 NL)
2D ClassificationCOCO minivalAP7544.5Mask R-CNN (ResNet-101 + 1 NL)
2D ClassificationCOCO minivalbox AP40.8Mask R-CNN (ResNet-101 + 1 NL)
2D ClassificationCOCO minivalAP5061.1Mask R-CNN (ResNet-50 + 1 NL)
2D ClassificationCOCO minivalAP7541.9Mask R-CNN (ResNet-50 + 1 NL)
2D ClassificationCOCO minivalbox AP39Mask R-CNN (ResNet-50 + 1 NL)
2D Object DetectionCOCO minivalAP5067.8Mask R-CNN (ResNeXt-152 + 1 NL)
2D Object DetectionCOCO minivalAP7548.9Mask R-CNN (ResNeXt-152 + 1 NL)
2D Object DetectionCOCO minivalbox AP45Mask R-CNN (ResNeXt-152 + 1 NL)
2D Object DetectionCOCO minivalAP5063.1Mask R-CNN (ResNet-101 + 1 NL)
2D Object DetectionCOCO minivalAP7544.5Mask R-CNN (ResNet-101 + 1 NL)
2D Object DetectionCOCO minivalbox AP40.8Mask R-CNN (ResNet-101 + 1 NL)
2D Object DetectionCOCO minivalAP5061.1Mask R-CNN (ResNet-50 + 1 NL)
2D Object DetectionCOCO minivalAP7541.9Mask R-CNN (ResNet-50 + 1 NL)
2D Object DetectionCOCO minivalbox AP39Mask R-CNN (ResNet-50 + 1 NL)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)Validation AP66.5Mask R-CNN + NL blocks (4 in head, 1 in backbone)
16kCOCO minivalAP5067.8Mask R-CNN (ResNeXt-152 + 1 NL)
16kCOCO minivalAP7548.9Mask R-CNN (ResNeXt-152 + 1 NL)
16kCOCO minivalbox AP45Mask R-CNN (ResNeXt-152 + 1 NL)
16kCOCO minivalAP5063.1Mask R-CNN (ResNet-101 + 1 NL)
16kCOCO minivalAP7544.5Mask R-CNN (ResNet-101 + 1 NL)
16kCOCO minivalbox AP40.8Mask R-CNN (ResNet-101 + 1 NL)
16kCOCO minivalAP5061.1Mask R-CNN (ResNet-50 + 1 NL)
16kCOCO minivalAP7541.9Mask R-CNN (ResNet-50 + 1 NL)
16kCOCO minivalbox AP39Mask R-CNN (ResNet-50 + 1 NL)

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17