TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Implicit Temporal Alignment for Few-shot Video Cl...

Learning Implicit Temporal Alignment for Few-shot Video Classification

Songyang Zhang, Jiale Zhou, Xuming He

2021-05-11Few-Shot LearningVideo ClassificationClassificationAction Recognition In Videos
PaperPDFCode(official)

Abstract

Few-shot video classification aims to learn new video categories with only a few labeled examples, alleviating the burden of costly annotation in real-world applications. However, it is particularly challenging to learn a class-invariant spatial-temporal representation in such a setting. To address this, we propose a novel matching-based few-shot learning strategy for video sequences in this work. Our main idea is to introduce an implicit temporal alignment for a video pair, capable of estimating the similarity between them in an accurate and robust manner. Moreover, we design an effective context encoding module to incorporate spatial and feature channel context, resulting in better modeling of intra-class variations. To train our model, we develop a multi-task loss for learning video matching, leading to video features with better generalization. Extensive experimental results on two challenging benchmarks, show that our method outperforms the prior arts with a sizable margin on SomethingSomething-V2 and competitive results on Kinetics.

Results

TaskDatasetMetricValueModel
Activity RecognitionFS-Something-Something V2-FullTop-1 Accuracy(5-Way-1-Shot)49.2ITANet
Activity RecognitionFS-Something-Something V2-FullTop-1 Accuracy(5-Way-5-Shot)62.3ITANet
Activity RecognitionFS-Something-Something V2-FullTop-1 Accuracy(5-Way-1-Shot)42.8OTAM[3]++
Activity RecognitionFS-Something-Something V2-FullTop-1 Accuracy(5-Way-5-Shot)52.3OTAM[3]++
Activity RecognitionFS-Something-Something V2-SmallTop-1 Accuracy(5-Way-1-Shot)39.8ITANet
Activity RecognitionFS-Something-Something V2-SmallTop-1 Accuracy(5-Way-5-Shot)53.7ITANet
Activity RecognitionFS-Something-Something V2-SmallTop-1 Accuracy(5-Way-1-Shot)36.2CMN[35]
Activity RecognitionFS-Something-Something V2-SmallTop-1 Accuracy(5-Way-5-Shot)48.8CMN[35]
Action RecognitionFS-Something-Something V2-FullTop-1 Accuracy(5-Way-1-Shot)49.2ITANet
Action RecognitionFS-Something-Something V2-FullTop-1 Accuracy(5-Way-5-Shot)62.3ITANet
Action RecognitionFS-Something-Something V2-FullTop-1 Accuracy(5-Way-1-Shot)42.8OTAM[3]++
Action RecognitionFS-Something-Something V2-FullTop-1 Accuracy(5-Way-5-Shot)52.3OTAM[3]++
Action RecognitionFS-Something-Something V2-SmallTop-1 Accuracy(5-Way-1-Shot)39.8ITANet
Action RecognitionFS-Something-Something V2-SmallTop-1 Accuracy(5-Way-5-Shot)53.7ITANet
Action RecognitionFS-Something-Something V2-SmallTop-1 Accuracy(5-Way-1-Shot)36.2CMN[35]
Action RecognitionFS-Something-Something V2-SmallTop-1 Accuracy(5-Way-5-Shot)48.8CMN[35]
Action Recognition In VideosFS-Something-Something V2-FullTop-1 Accuracy(5-Way-1-Shot)49.2ITANet
Action Recognition In VideosFS-Something-Something V2-FullTop-1 Accuracy(5-Way-5-Shot)62.3ITANet
Action Recognition In VideosFS-Something-Something V2-FullTop-1 Accuracy(5-Way-1-Shot)42.8OTAM[3]++
Action Recognition In VideosFS-Something-Something V2-FullTop-1 Accuracy(5-Way-5-Shot)52.3OTAM[3]++
Action Recognition In VideosFS-Something-Something V2-SmallTop-1 Accuracy(5-Way-1-Shot)39.8ITANet
Action Recognition In VideosFS-Something-Something V2-SmallTop-1 Accuracy(5-Way-5-Shot)53.7ITANet
Action Recognition In VideosFS-Something-Something V2-SmallTop-1 Accuracy(5-Way-1-Shot)36.2CMN[35]
Action Recognition In VideosFS-Something-Something V2-SmallTop-1 Accuracy(5-Way-5-Shot)48.8CMN[35]

Related Papers

GLAD: Generalizable Tuning for Vision-Language Models2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection2025-07-10An Enhanced Privacy-preserving Federated Few-shot Learning Framework for Respiratory Disease Diagnosis2025-07-10Few-Shot Learning by Explicit Physics Integration: An Application to Groundwater Heat Transport2025-07-08