TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Hyperbolic Audio-visual Zero-shot Learning

Hyperbolic Audio-visual Zero-shot Learning

Jie Hong, Zeeshan Hayder, Junlin Han, Pengfei Fang, Mehrtash Harandi, Lars Petersson

2023-08-24ICCV 2023 1GZSL Video ClassificationZero-Shot Learning
PaperPDF

Abstract

Audio-visual zero-shot learning aims to classify samples consisting of a pair of corresponding audio and video sequences from classes that are not present during training. An analysis of the audio-visual data reveals a large degree of hyperbolicity, indicating the potential benefit of using a hyperbolic transformation to achieve curvature-aware geometric learning, with the aim of exploring more complex hierarchical data structures for this task. The proposed approach employs a novel loss function that incorporates cross-modality alignment between video and audio features in the hyperbolic space. Additionally, we explore the use of multiple adaptive curvatures for hyperbolic projections. The experimental results on this very challenging task demonstrate that our proposed hyperbolic approach for zero-shot learning outperforms the SOTA method on three datasets: VGGSound-GZSL, UCF-GZSL, and ActivityNet-GZSL achieving a harmonic mean (HM) improvement of around 3.0%, 7.0%, and 5.3%, respectively.

Results

TaskDatasetMetricValueModel
Zero-Shot LearningActivityNet-GZSL(main)HM12.65Hyper-multiple
Zero-Shot LearningActivityNet-GZSL(main)ZSL9.5Hyper-multiple
Zero-Shot LearningVGGSound-GZSL (cls)HM8.67Hyper-multiple
Zero-Shot LearningVGGSound-GZSL (cls)ZSL7.31Hyper-multiple
Zero-Shot LearningActivityNet-GZSL (cls)HM15.25Hyper-multiple
Zero-Shot LearningActivityNet-GZSL (cls)ZSL10.39Hyper-multiple
Zero-Shot LearningVGGSound-GZSL(main)HM9.32Hyper-multiple
Zero-Shot LearningVGGSound-GZSL(main)ZSL7.97Hyper-multiple
Zero-Shot LearningUCF-GZSL (cls)HM48.3Hyper-multiple
Zero-Shot LearningUCF-GZSL (cls)ZSL52.11Hyper-multiple
Zero-Shot LearningUCF-GZSL(main)HM29.32Hyper-multiple
Zero-Shot LearningUCF-GZSL(main)ZSL22.24Hyper-multiple

Related Papers

GLAD: Generalizable Tuning for Vision-Language Models2025-07-17DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation2025-07-14EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning2025-06-26Zero-Shot Learning for Obsolescence Risk Forecasting2025-06-26SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network2025-06-25A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement2025-06-23Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation2025-06-20AnyTraverse: An off-road traversability framework with VLM and human operator in the loop2025-06-20