Actor-agnostic Multi-label Action Recognition with Multi-modal Query

Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta

2023-07-20Action Classification Animal Action Recognition Zero-Shot Action Recognition Action Recognition Action Recognition In Videos

Paper PDF Code(official)

Abstract

Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code is made available at https://github.com/mondalanindya/MSQNet.

Results

Task	Dataset	Metric	Value	Model
Activity Recognition	Hockey	Accuracy	3.05	MSQNet
Activity Recognition	HMDB51	Accuracy	93.25	MSQNet
Activity Recognition	Charades	MAP	47.57	MSQNet
Activity Recognition	THUMOS14	Accuracy	83.16	MSQNet
Activity Recognition	Animal Kingdom	mAP	73.1	MSQNet
Action Recognition	Hockey	Accuracy	3.05	MSQNet
Action Recognition	HMDB51	Accuracy	93.25	MSQNet
Action Recognition	Charades	MAP	47.57	MSQNet
Action Recognition	THUMOS14	Accuracy	83.16	MSQNet
Action Recognition	Animal Kingdom	mAP	73.1	MSQNet
Zero-Shot Action Recognition	Charades	mAP	35.59	MSQNet
Zero-Shot Action Recognition	HMDB51	Accuracy	69.43	MSQNet
Zero-Shot Action Recognition	THUMOS' 14	Accuracy	75.33	MSQNet

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

Abstract

Results

Related Papers

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

Abstract

Results

Related Papers