TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/LIGAR: Lightweight General-purpose Action Recognition

LIGAR: Lightweight General-purpose Action Recognition

Evgeny Izutov

2021-08-30Gesture RecognitionVideo UnderstandingAction RecognitionHand-Gesture Recognition
PaperPDFCode(official)

Abstract

Growing amount of different practical tasks in a video understanding problem has addressed the great challenge aiming to design an universal solution, which should be available for broad masses and suitable for the demanding edge-oriented inference. In this paper we are focused on designing a network architecture and a training pipeline to tackle the mentioned challenges. Our architecture takes the best from the previous ones and brings the ability to be successful not only in appearance-based action recognition tasks but in motion-based problems too. Furthermore, the induced label noise problem is formulated and Adaptive Clip Selection (ACS) framework is proposed to deal with it. Together it makes the LIGAR framework the general-purpose action recognition solution. We also have reported the extensive analysis on the general and gesture datasets to show the excellent trade-off between the performance and the accuracy in comparison to the state-of-the-art solutions. Training code is available at: https://github.com/openvinotoolkit/training_extensions. For the efficient edge-oriented inference all trained models can be exported into the OpenVINO format.

Results

TaskDatasetMetricValueModel
Activity RecognitionJester (Gesture Recognition)Val95.56X3D MobileNet-V3 LGD-GC
Activity RecognitionUCF1013-fold Accuracy94.85X3D MobileNet-V3 LGD-GC
Action RecognitionJester (Gesture Recognition)Val95.56X3D MobileNet-V3 LGD-GC
Action RecognitionUCF1013-fold Accuracy94.85X3D MobileNet-V3 LGD-GC

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks2025-07-15EmbRACE-3K: Embodied Reasoning and Action in Complex Environments2025-07-14Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI2025-07-14Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation2025-07-08Omni-Video: Democratizing Unified Video Understanding and Generation2025-07-08