TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Play It Back: Iterative Attention for Audio Recognition

Play It Back: Iterative Attention for Audio Recognition

Alexandros Stergiou, Dima Damen

2022-10-20Audio Classification
PaperPDFCode(official)

Abstract

A key function of auditory cognition is the association of characteristic sounds with their corresponding semantics over time. Humans attempting to discriminate between fine-grained audio categories, often replay the same discriminative sounds to increase their prediction confidence. We propose an end-to-end attention-based architecture that through selective repetition attends over the most discriminative sounds across the audio sequence. Our model initially uses the full audio sequence and iteratively refines the temporal segments replayed based on slot attention. At each playback, the selected segments are replayed using a smaller hop length which represents higher resolution features within these segments. We show that our method can consistently achieve state-of-the-art performance across three audio-classification benchmarks: AudioSet, VGG-Sound, and EPIC-KITCHENS-100.

Results

TaskDatasetMetricValueModel
Audio ClassificationEPIC-KITCHENS-100Top-1 Action15.9PlayItBackX3
Audio ClassificationEPIC-KITCHENS-100Top-1 Noun23.1PlayItBackX3
Audio ClassificationEPIC-KITCHENS-100Top-1 Verb47PlayItBackX3
Audio ClassificationEPIC-KITCHENS-100Top-5 Action29.2PlayItBackX3
Audio ClassificationEPIC-KITCHENS-100Top-5 Noun45.1PlayItBackX3
Audio ClassificationEPIC-KITCHENS-100Top-5 Verb78.7PlayItBackX3
Audio ClassificationAudioSetTest mAP0.477PlayItBackX3
Audio ClassificationVGGSoundAUC97.8PlayItBackX3
Audio ClassificationVGGSoundMean AP56.1PlayItBackX3
Audio ClassificationVGGSoundTop 1 Accuracy53.7PlayItBackX3
Audio ClassificationVGGSoundTop 5 Accuracy79.2PlayItBackX3
Audio ClassificationVGGSoundd-prime2.846PlayItBackX3
ClassificationEPIC-KITCHENS-100Top-1 Action15.9PlayItBackX3
ClassificationEPIC-KITCHENS-100Top-1 Noun23.1PlayItBackX3
ClassificationEPIC-KITCHENS-100Top-1 Verb47PlayItBackX3
ClassificationEPIC-KITCHENS-100Top-5 Action29.2PlayItBackX3
ClassificationEPIC-KITCHENS-100Top-5 Noun45.1PlayItBackX3
ClassificationEPIC-KITCHENS-100Top-5 Verb78.7PlayItBackX3
ClassificationAudioSetTest mAP0.477PlayItBackX3
ClassificationVGGSoundAUC97.8PlayItBackX3
ClassificationVGGSoundMean AP56.1PlayItBackX3
ClassificationVGGSoundTop 1 Accuracy53.7PlayItBackX3
ClassificationVGGSoundTop 5 Accuracy79.2PlayItBackX3
ClassificationVGGSoundd-prime2.846PlayItBackX3

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons2025-06-24Fully Few-shot Class-incremental Audio Classification Using Multi-level Embedding Extractor and Ridge Regression Classifier2025-06-23Adaptive Differential Denoising for Respiratory Sounds Classification2025-06-03Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds2025-05-29Patient-Aware Feature Alignment for Robust Lung Sound Classification:Cohesion-Separation and Global Alignment Losses2025-05-284,500 Seconds: Small Data Training Approaches for Deep UAV Audio Classification2025-05-21