TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/FastAudio: A Learnable Audio Front-End for Spoof Speech De...

FastAudio: A Learnable Audio Front-End for Spoof Speech Detection

Quchen Fu, Zhongwei Teng, Jules White, Maria Powell, Douglas C. Schmidt

2021-09-06Voice Anti-spoofingSpeaker IdentificationSpeaker Verification
PaperPDFCode(official)

Abstract

Voice assistants, such as smart speakers, have exploded in popularity. It is currently estimated that the smart speaker adoption rate has exceeded 35% in the US adult population. Manufacturers have integrated speaker identification technology, which attempts to determine the identity of the person speaking, to provide personalized services to different members of the same family. Speaker identification can also play an important role in controlling how the smart speaker is used. For example, it is not critical to correctly identify the user when playing music. However, when reading the user's email out loud, it is critical to correctly verify the speaker that making the request is the authorized user. Speaker verification systems, which authenticate the speaker identity, are therefore needed as a gatekeeper to protect against various spoofing attacks that aim to impersonate the enrolled user. This paper compares popular learnable front-ends which learn the representations of audio by joint training with downstream tasks (End-to-End). We categorize the front-ends by defining two generic architectures and then analyze the filtering stages of both types in terms of learning constraints. We propose replacing fixed filterbanks with a learnable layer that can better adapt to anti-spoofing tasks. The proposed FastAudio front-end is then tested with two popular back-ends to measure the performance on the LA track of the ASVspoof 2019 dataset. The FastAudio front-end achieves a relative improvement of 27% when compared with fixed front-ends, outperforming all other learnable front-ends on this task.

Results

TaskDatasetMetricValueModel
Voice Anti-spoofingASVspoof2019EER1.54FastAudio

Related Papers

SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks2025-07-17SSAVSV: Towards Unified Model for Self-Supervised Audio-Visual Speaker Verification2025-06-21Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models2025-06-17A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments2025-06-17Mitigating Non-Target Speaker Bias in Guided Speaker Embedding2025-06-14CoLMbo: Speaker Language Model for Descriptive Profiling2025-06-11You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks2025-06-11SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models2025-06-10