TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Region-based Non-local Operation for Video Classification

Region-based Non-local Operation for Video Classification

Guoxi Huang, Adrian G. Bors

2020-07-17Action ClassificationVideo ClassificationGeneral ClassificationAction RecognitionClassificationAction Recognition In Videos
PaperPDFCode(official)

Abstract

Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult. This paper presents region-based non-local (RNL) operations as a family of self-attention mechanisms, which can directly capture long-range dependencies without using a deep stack of local operations. Given an intermediate feature map, our method recalibrates the feature at a position by aggregating the information from the neighboring regions of all positions. By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training. We evaluate our method on two video classification benchmarks. The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the Something-Something V1 dataset.

Results

TaskDatasetMetricValueModel
VideoKinetics-400Acc@177.4RNL+TSM Ensemble(ResNet50, 8 + 16 frames)
Activity RecognitionSomething-Something V1Top 1 Accuracy54.1RNL+TSM Ensemble(R50+R101, ImageNet pretrained)
Activity RecognitionSomething-Something V1Top 5 Accuracy82.2RNL+TSM Ensemble(R50+R101, ImageNet pretrained)
Activity RecognitionSomething-Something V1Top 1 Accuracy52.7RNL+TSM Ensemble(ResNet50, ImageNet pretrained)
Activity RecognitionSomething-Something V1Top 5 Accuracy81.5RNL+TSM Ensemble(ResNet50, ImageNet pretrained)
Action RecognitionSomething-Something V1Top 1 Accuracy54.1RNL+TSM Ensemble(R50+R101, ImageNet pretrained)
Action RecognitionSomething-Something V1Top 5 Accuracy82.2RNL+TSM Ensemble(R50+R101, ImageNet pretrained)
Action RecognitionSomething-Something V1Top 1 Accuracy52.7RNL+TSM Ensemble(ResNet50, ImageNet pretrained)
Action RecognitionSomething-Something V1Top 5 Accuracy81.5RNL+TSM Ensemble(ResNet50, ImageNet pretrained)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13Fuzzy Classification Aggregation for a Continuum of Agents2025-07-06Hybrid-View Attention for csPCa Classification in TRUS2025-07-04Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01