TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/End-to-End Spectro-Temporal Graph Attention Networks for S...

End-to-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection

Hemlata Tak, Jee-weon Jung, Jose Patino, Madhu Kamble, Massimiliano Todisco, Nicholas Evans

2021-07-27Speaker VerificationDeepFake DetectionAudio Deepfake DetectionFace SwappingGraph Attention
PaperPDFCode(official)

Abstract

Artefacts that serve to distinguish bona fide speech from spoofed or deepfake speech are known to reside in specific subbands and temporal segments. Various approaches can be used to capture and model such artefacts, however, none works well across a spectrum of diverse spoofing attacks. Reliable detection then often depends upon the fusion of multiple detection systems, each tuned to detect different forms of attack. In this paper we show that better performance can be achieved when the fusion is performed within the model itself and when the representation is learned automatically from raw waveform inputs. The principal contribution is a spectro-temporal graph attention network (GAT) which learns the relationship between cues spanning different sub-bands and temporal intervals. Using a model-level graph fusion of spectral (S) and temporal (T) sub-graphs and a graph pooling strategy to improve discrimination, the proposed RawGAT-ST model achieves an equal error rate of 1.06 % for the ASVspoof 2019 logical access database. This is one of the best results reported to date and is reproducible using an open source implementation.

Results

TaskDatasetMetricValueModel
3D ReconstructionASVspoof 202121DF EER23.26RawGAT-ST
3D ReconstructionASVspoof 202121LA EER10.25RawGAT-ST
Speaker VerificationASVspoof 202121DF EER23.26RawGAT-ST
Speaker VerificationASVspoof 202121LA EER10.25RawGAT-ST
3DASVspoof 202121DF EER23.26RawGAT-ST
3DASVspoof 202121LA EER10.25RawGAT-ST
DeepFake DetectionASVspoof 202121DF EER23.26RawGAT-ST
DeepFake DetectionASVspoof 202121LA EER10.25RawGAT-ST
3D Shape Reconstruction from VideosASVspoof 202121DF EER23.26RawGAT-ST
3D Shape Reconstruction from VideosASVspoof 202121LA EER10.25RawGAT-ST

Related Papers

SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Catching Bid-rigging Cartels with Graph Attention Neural Networks2025-07-16Wavelet-Enhanced Neural ODE and Graph Attention for Interpretable Energy Forecasting2025-07-14CorrDetail: Visual Detail Enhanced Self-Correction for Face Forgery Detection2025-07-07Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection2025-07-03Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence2025-07-02DDL: A Dataset for Interpretable Deepfake Detection and Localization in Real-World Scenarios2025-06-29