TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Depth Guided Adaptive Meta-Fusion Network for Few-shot Vid...

Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition

Yuqian Fu, Li Zhang, Junke Wang, Yanwei Fu, Yu-Gang Jiang

2020-10-20Meta-LearningVideo RecognitionFew Shot Action RecognitionAction RecognitionTemporal Action Localization
PaperPDFCode(official)

Abstract

Humans can easily recognize actions with only a few examples given, while the existing video recognition models still heavily rely on the large-scale labeled data inputs. This observation has motivated an increasing interest in few-shot video action recognition, which aims at learning new actions with only very few labeled samples. In this paper, we propose a depth guided Adaptive Meta-Fusion Network for few-shot video recognition which is termed as AMeFu-Net. Concretely, we tackle the few-shot recognition problem from three aspects: firstly, we alleviate this extremely data-scarce problem by introducing depth information as a carrier of the scene, which will bring extra visual information to our model; secondly, we fuse the representation of original RGB clips with multiple non-strictly corresponding depth clips sampled by our temporal asynchronization augmentation mechanism, which synthesizes new instances at feature-level; thirdly, a novel Depth Guided Adaptive Instance Normalization (DGAdaIN) fusion module is proposed to fuse the two-stream modalities efficiently. Additionally, to better mimic the few-shot recognition process, our model is trained in the meta-learning way. Extensive experiments on several action recognition benchmarks demonstrate the effectiveness of our model.

Results

TaskDatasetMetricValueModel
Activity RecognitionHMDB511:1 Accuracy75.5AMeFu-Net
Activity RecognitionKinetics-100Accuracy86.8AMeFu-Net
Activity RecognitionUCF1011:1 Accuracy95.5AMeFu-Net
Action RecognitionHMDB511:1 Accuracy75.5AMeFu-Net
Action RecognitionKinetics-100Accuracy86.8AMeFu-Net
Action RecognitionUCF1011:1 Accuracy95.5AMeFu-Net

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Imbalanced Regression Pipeline Recommendation2025-07-16CLID-MU: Cross-Layer Information Divergence Based Meta Update Strategy for Learning with Noisy Labels2025-07-16DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Mixture of Experts in Large Language Models2025-07-15Iceberg: Enhancing HLS Modeling with Synthetic Data2025-07-14Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum Allocation in Dynamic Wireless Networks2025-07-13