TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Combining EfficientNet and Vision Transformers for Video D...

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

Davide Coccomini, Nicola Messina, Claudio Gennaro, Fabrizio Falchi

2021-07-06DeepFake DetectionFace Swapping
PaperPDFCode(official)Code(official)Code

Abstract

Deepfakes are the result of digital manipulation to forge realistic yet fake imagery. With the astonishing advances in deep generative models, fake images or videos are nowadays obtained using variational autoencoders (VAEs) or Generative Adversarial Networks (GANs). These technologies are becoming more accessible and accurate, resulting in fake videos that are very difficult to be detected. Traditionally, Convolutional Neural Networks (CNNs) have been used to perform video deepfake detection, with the best results obtained using methods based on EfficientNet B7. In this study, we focus on video deep fake detection on faces, given that most methods are becoming extremely accurate in the generation of realistic human faces. Specifically, we combine various types of Vision Transformers with a convolutional EfficientNet B0 used as a feature extractor, obtaining comparable results with some very recent methods that use Vision Transformers. Differently from the state-of-the-art approaches, we use neither distillation nor ensemble methods. Furthermore, we present a straightforward inference procedure based on a simple voting scheme for handling multiple faces in the same video shot. The best model achieved an AUC of 0.951 and an F1 score of 88.0%, very close to the state-of-the-art on the DeepFake Detection Challenge (DFDC).

Results

TaskDatasetMetricValueModel
3D ReconstructionDFDCAUC0.951Cross Efficient Vision Transformer
3D ReconstructionDFDCAUC0.919Efficient Vision Transformer
3DDFDCAUC0.951Cross Efficient Vision Transformer
3DDFDCAUC0.919Efficient Vision Transformer
DeepFake DetectionDFDCAUC0.951Cross Efficient Vision Transformer
DeepFake DetectionDFDCAUC0.919Efficient Vision Transformer
3D Shape Reconstruction from VideosDFDCAUC0.951Cross Efficient Vision Transformer
3D Shape Reconstruction from VideosDFDCAUC0.919Efficient Vision Transformer

Related Papers

SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16CorrDetail: Visual Detail Enhanced Self-Correction for Face Forgery Detection2025-07-07Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection2025-07-03DDL: A Dataset for Interpretable Deepfake Detection and Localization in Real-World Scenarios2025-06-29Post-training for Deepfake Speech Detection2025-06-26Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks2025-06-25IndieFake Dataset: A Benchmark Dataset for Audio Deepfake Detection2025-06-23