TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Co-Separating Sounds of Visual Objects

Co-Separating Sounds of Visual Objects

Ruohan Gao, Kristen Grauman

2019-04-16ICCV 2019 10DenoisingAudio DenoisingAudio Source Separation
PaperPDFCodeCodeCode

Abstract

Learning how objects sound from video is challenging, since they often heavily overlap in a single audio channel. Current methods for visually-guided audio source separation sidestep the issue by training with artificially mixed video clips, but this puts unwieldy restrictions on training data collection and may even prevent learning the properties of "true" mixed sounds. We introduce a co-separation training paradigm that permits learning object-level sounds from unlabeled multi-source videos. Our novel training objective requires that the deep neural network's separated audio for similar-looking objects be consistently identifiable, while simultaneously reproducing accurate video-level audio tracks for each source training pair. Our approach disentangles sounds in realistic test videos, even in cases where an object was not observed individually during training. We obtain state-of-the-art results on visually-guided audio source separation and audio denoising for the MUSIC, AudioSet, and AV-Bench datasets.

Results

TaskDatasetMetricValueModel
Audio DenoisingAV-Bench - Violin YanniNSDR8.53Co-Separation
Audio DenoisingAV-Bench - Wooden HorseNSDR14.5Co-Separation
Audio DenoisingAV-Bench - Guitar SoloNSDR11.9Co-Separation
Audio Source SeparationAudioSetSAR13Co-Separation
Audio Source SeparationAudioSetSDR4.26Co-Separation
Audio Source SeparationAudioSetSIR7.07Co-Separation
Audio Source SeparationMUSIC (multi-source)SAR11.3Co-Separation
Audio Source SeparationMUSIC (multi-source)SIR13.8Co-Separation

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models2025-07-15A statistical physics framework for optimal learning2025-07-10LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models2025-07-08