TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Au...

EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance

Jaeyeon Kim, Minjeon Jeon, JaeYoon Jung, Sang Hoon Woo, Jinjoo Lee

2024-09-02RerankingAudio captioningLanguage Modelling
PaperPDFCode(official)

Abstract

In this work, we aim to analyze and optimize the EnCLAP framework, a state-of-the-art model in automated audio captioning. We investigate the impact of modifying the acoustic encoder components, explore pretraining with different dataset scales, and study the effectiveness of a reranking scheme. Through extensive experimentation and quantitative analysis of generated captions, we develop EnCLAP++, an enhanced version that significantly surpasses the original.

Results

TaskDatasetMetricValueModel
Audio captioningAudioCapsCIDEr0.823EnCLAP++-large
Audio captioningAudioCapsFENSE0.665EnCLAP++-large
Audio captioningAudioCapsMETEOR0.269EnCLAP++-large
Audio captioningAudioCapsSPICE0.197EnCLAP++-large
Audio captioningAudioCapsSPIDEr0.51EnCLAP++-large
Audio captioningAudioCapsCIDEr0.815EnCLAP++-base
Audio captioningAudioCapsFENSE0.661EnCLAP++-base
Audio captioningAudioCapsMETEOR0.257EnCLAP++-base
Audio captioningAudioCapsSPICE0.188EnCLAP++-base
Audio captioningAudioCapsSPIDEr0.501EnCLAP++-base

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16