TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He

2018-12-10ICCV 2019 10Action DetectionAction ClassificationVideo RecognitionGeneral ClassificationAction RecognitionAction Recognition In Videos
PaperPDFCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating at high frame rate, to capture motion at fine temporal resolution. The Fast pathway can be made very lightweight by reducing its channel capacity, yet can learn useful temporal information for video recognition. Our models achieve strong performance for both action classification and detection in video, and large improvements are pin-pointed as contributions by our SlowFast concept. We report state-of-the-art accuracy on major video recognition benchmarks, Kinetics, Charades and AVA. Code has been made available at: https://github.com/facebookresearch/SlowFast

Results

TaskDatasetMetricValueModel
VideoCharadesMAP45.2SlowFast (Kinetics-600 pretraining, NL)
VideoCharadesMAP42.5SlowFast (Kinetics-400 pretraining, NL)
VideoCharadesMAP42.1SlowFast (Kinetics-600 pretraining)
VideoKinetics-400Acc@179.8SlowFast 16x8 (ResNet-101 + NL)
VideoKinetics-400Acc@178.9SlowFast 16x8 (ResNet-101)
VideoKinetics-400Acc@593.5SlowFast 16x8 (ResNet-101)
VideoKinetics-400Acc@177.9SlowFast 8x8 (ResNet-101)
VideoKinetics-400Acc@593.2SlowFast 8x8 (ResNet-101)
VideoKinetics-400Acc@177SlowFast 8x8 (ResNet-50)
VideoKinetics-400Acc@592.6SlowFast 8x8 (ResNet-50)
VideoKinetics-400Acc@175.6SlowFast 4x16 (ResNet-50)
VideoKinetics-400Acc@592.1SlowFast 4x16 (ResNet-50)
VideoKinetics-400Acc@593.9SlowFast 16x8 (ResNet-101 + NL)
VideoKinetics-600Top-1 Accuracy81.8SlowFast 16x8 (ResNet-101 + NL)
VideoKinetics-600Top-5 Accuracy95.1SlowFast 16x8 (ResNet-101 + NL)
VideoKinetics-600Top-1 Accuracy81.1SlowFast 16x8 (ResNet-101)
VideoKinetics-600Top-5 Accuracy95.1SlowFast 16x8 (ResNet-101)
VideoKinetics-600Top-1 Accuracy80.4SlowFast 8x8 (ResNet-101)
VideoKinetics-600Top-5 Accuracy94.8SlowFast 8x8 (ResNet-101)
VideoKinetics-600Top-1 Accuracy79.9SlowFast 8x8 (ResNet-50)
VideoKinetics-600Top-5 Accuracy94.5SlowFast 8x8 (ResNet-50)
VideoKinetics-600Top-1 Accuracy78.8SlowFast 4x16 (ResNet-50)
VideoKinetics-600Top-5 Accuracy94SlowFast 4x16 (ResNet-50)
Activity RecognitionDiving-48Accuracy77.6SlowFast
Activity RecognitionAVA v2.1mAP (Val)28.3SlowFast++ (Kinetics-600 pretraining, NL)
Activity RecognitionAVA v2.1mAP (Val)27.3SlowFast (Kinetics-600 pretraining, NL)
Activity RecognitionAVA v2.1mAP (Val)26.8SlowFast (Kinetics-600 pretraining)
Activity RecognitionAVA v2.1mAP (Val)26.3SlowFast (Kinetics-400 pretraining)
Activity RecognitionSomething-Something V2Top-1 Accuracy61.7SlowFast
Activity RecognitionH2O (2 Hands and Objects)Actions Top-177.69SlowFast
Activity RecognitionAVA v2.2mAP27.5SlowFast, 16x8 R101+NL (Kinetics-600 pretraining)
Activity RecognitionAVA v2.2mAP27.1SlowFast, 8x8 R101+NL (Kinetics-600 pretraining)
Activity RecognitionAVA v2.2mAP23.8SlowFast, 8x8, R101 (Kinetics-400 pretraining)
Activity RecognitionAVA v2.2mAP21.9SlowFast, 4x16, R50 (Kinetics-400 pretraining)
Action RecognitionDiving-48Accuracy77.6SlowFast
Action RecognitionAVA v2.1mAP (Val)28.3SlowFast++ (Kinetics-600 pretraining, NL)
Action RecognitionAVA v2.1mAP (Val)27.3SlowFast (Kinetics-600 pretraining, NL)
Action RecognitionAVA v2.1mAP (Val)26.8SlowFast (Kinetics-600 pretraining)
Action RecognitionAVA v2.1mAP (Val)26.3SlowFast (Kinetics-400 pretraining)
Action RecognitionSomething-Something V2Top-1 Accuracy61.7SlowFast
Action RecognitionH2O (2 Hands and Objects)Actions Top-177.69SlowFast
Action RecognitionAVA v2.2mAP27.5SlowFast, 16x8 R101+NL (Kinetics-600 pretraining)
Action RecognitionAVA v2.2mAP27.1SlowFast, 8x8 R101+NL (Kinetics-600 pretraining)
Action RecognitionAVA v2.2mAP23.8SlowFast, 8x8, R101 (Kinetics-400 pretraining)
Action RecognitionAVA v2.2mAP21.9SlowFast, 4x16, R50 (Kinetics-400 pretraining)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25