TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/EfficientLEAF: A Faster LEarnable Audio Frontend of Questi...

EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

Jan Schlüter, Gerald Gutenbrunner

2022-07-12Audio ClassificationPitch ClassificationSpoken language identificationClassificationInstrument Recognition
PaperPDFCode(official)

Abstract

In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy Normalization (PCEN), has shown promising results, but is computationally expensive. With inhomogeneous convolution kernel sizes and strides, and by replacing PCEN with better parallelizable operations, we can reach similar results more efficiently. In experiments on six audio classification tasks, our frontend matches the accuracy of LEAF at 3% of the cost, but both fail to consistently outperform a fixed mel filterbank. The quest for learnable audio frontends is not solved.

Results

TaskDatasetMetricValueModel
DialogueVoxForgeAccuracy91.5LEAF
DialogueVoxForgeAccuracy86.6EfficientLEAF
DialogueVoxForgeAccuracy85.6melspect
Spoken Language UnderstandingVoxForgeAccuracy91.5LEAF
Spoken Language UnderstandingVoxForgeAccuracy86.6EfficientLEAF
Spoken Language UnderstandingVoxForgeAccuracy85.6melspect
Audio ClassificationSpeech CommandsAccuracy95.2EfficientLEAF
Audio ClassificationSpeech CommandsAccuracy95.1LEAF
Audio ClassificationSpeech CommandsAccuracy95.1melspect
Audio ClassificationCREMA-DAccuracy60.2EfficientLEAF
Audio ClassificationCREMA-DAccuracy58.8melspect
Audio ClassificationCREMA-DAccuracy50.2LEAF
Audio ClassificationBirdCLEF 2021Accuracy72.2EfficientLEAF (8s)
Audio ClassificationBirdCLEF 2021Accuracy42.9EfficientLEAF
Audio ClassificationBirdCLEF 2021Accuracy42.3LEAF
Audio ClassificationBirdCLEF 2021Accuracy39.9melspect
Dialogue UnderstandingVoxForgeAccuracy91.5LEAF
Dialogue UnderstandingVoxForgeAccuracy86.6EfficientLEAF
Dialogue UnderstandingVoxForgeAccuracy85.6melspect
ClassificationSpeech CommandsAccuracy95.2EfficientLEAF
ClassificationSpeech CommandsAccuracy95.1LEAF
ClassificationSpeech CommandsAccuracy95.1melspect
ClassificationCREMA-DAccuracy60.2EfficientLEAF
ClassificationCREMA-DAccuracy58.8melspect
ClassificationCREMA-DAccuracy50.2LEAF
ClassificationBirdCLEF 2021Accuracy72.2EfficientLEAF (8s)
ClassificationBirdCLEF 2021Accuracy42.9EfficientLEAF
ClassificationBirdCLEF 2021Accuracy42.3LEAF
ClassificationBirdCLEF 2021Accuracy39.9melspect
Instrument RecognitionNSynthAccuracy72.1melspect
Instrument RecognitionNSynthAccuracy71.7EfficientLEAF
Instrument RecognitionNSynthAccuracy69.2LEAF

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13Fuzzy Classification Aggregation for a Continuum of Agents2025-07-06Hybrid-View Attention for csPCa Classification in TRUS2025-07-04