AVSBench

Audio −Visual Segmentation

AudioVideosApache-2.0 licenseIntroduced 2023-01-30

AVSBench is a pixel-level audio-visual segmentation benchmark that provides ground truth labels for sounding objects. The dataset is divided into three subsets: AVSBench-object (Single-source subset, Multi-sources subset) and AVSBench-semantic (Semantic-labels subset). Accordingly, three settings are studied:

  1. semi-supervised audio-visual segmentation with a single sound source

  2. fully-supervised audio-visual segmentation with multiple sound sources

  3. fully-supervised audio-visual semantic segmentation

Source: Audio-Visual Segmentation with Semantics