TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/X3D: Expanding Architectures for Efficient Video Recognition

X3D: Expanding Architectures for Efficient Video Recognition

Christoph Feichtenhofer

2020-04-09CVPR 2020 6Image ClassificationAction Classificationfeature selectionVideo RecognitionVideo ClassificationGeneral Classification
PaperPDFCodeCodeCode(official)CodeCodeCodeCodeCode

Abstract

This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth. Inspired by feature selection methods in machine learning, a simple stepwise network expansion approach is employed that expands a single axis in each step, such that good accuracy to complexity trade-off is achieved. To expand X3D to a specific target complexity, we perform progressive forward expansion followed by backward contraction. X3D achieves state-of-the-art performance while requiring 4.8x and 5.5x fewer multiply-adds and parameters for similar accuracy as previous work. Our most surprising finding is that networks with high spatiotemporal resolution can perform well, while being extremely light in terms of network width and parameters. We report competitive accuracy at unprecedented efficiency on video classification and detection benchmarks. Code will be available at: https://github.com/facebookresearch/SlowFast

Results

TaskDatasetMetricValueModel
VideoKinetics-400Acc@180.4X3D-XXL
VideoKinetics-400Acc@594.6X3D-XXL
VideoKinetics-400Acc@179.1X3D-XL
VideoKinetics-400Acc@593.9X3D-XL
VideoKinetics-400Acc@177.5X3D-L
VideoKinetics-400Acc@592.9X3D-L
VideoKinetics-400Acc@176X3D-M
VideoKinetics-400Acc@592.3X3D-M

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17mNARX+: A surrogate model for complex dynamical systems using manifold-NARX and automatic feature selection2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15