TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Filler Word Detection and Classification: A Dataset and Be...

Filler Word Detection and Classification: A Dataset and Benchmark

Ge Zhu, Juan-Pablo Caceres, Justin Salamon

2022-03-28Keyword SpottingSound Event Localization and DetectionClassification
PaperPDFCode(official)

Abstract

Filler words such as `uh' or `um' are sounds or words people use to signal they are pausing to think. Finding and removing filler words from recordings is a common and tedious task in media editing. Automatically detecting and classifying filler words could greatly aid in this task, but few studies have been published on this problem to date. A key reason is the absence of a dataset with annotated filler words for model training and evaluation. In this work, we present a novel speech dataset, PodcastFillers, with 35K annotated filler words and 50K annotations of other sounds that commonly occur in podcasts such as breaths, laughter, and word repetitions. We propose a pipeline that leverages VAD and ASR to detect filler candidates and a classifier to distinguish between filler word types. We evaluate our proposed pipeline on PodcastFillers, compare to several baselines, and present a detailed ablation study. In particular, we evaluate the importance of using ASR and how it compares to a transcription-free approach resembling keyword spotting. We show that our pipeline obtains state-of-the-art results, and that leveraging ASR strongly outperforms a keyword spotting approach. We make PodcastFillers publicly available, in the hope that our work serves as a benchmark for future research.

Results

TaskDatasetMetricValueModel
Sound Event Localization and DetectionPodcastFillersevent-based F1 score92.8AVC-FillerNet
Sound Event Localization and DetectionPodcastFillersevent-based F1 score71VC-FillerNet

Related Papers

Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos2025-07-07Fuzzy Classification Aggregation for a Continuum of Agents2025-07-06Hybrid-View Attention for csPCa Classification in TRUS2025-07-04Devising a solution to the problems of Cancer awareness in Telangana2025-06-26