TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Video Classification with Channel-Separated Convolutional ...

Video Classification with Channel-Separated Convolutional Networks

Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli

2019-04-04ICCV 2019 10Image ClassificationAction ClassificationVideo ClassificationGeneral ClassificationAction Recognition
PaperPDFCodeCodeCodeCodeCode(official)CodeCode

Abstract

Group convolution has been shown to offer great computational savings in various 2D convolutional architectures for image classification. It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks. This paper studies the effects of different design choices in 3D group convolutional networks for video classification. We empirically demonstrate that the amount of channel interactions plays an important role in the accuracy of 3D group convolutional networks. Our experiments suggest two main findings. First, it is a good practice to factorize 3D convolutions by separating channel interactions and spatiotemporal interactions as this leads to improved accuracy and lower computational cost. Second, 3D channel-separated convolutions provide a form of regularization, yielding lower training accuracy but higher test accuracy compared to 3D convolutions. These two empirical findings lead us to design an architecture -- Channel-Separated Convolutional Network (CSN) -- which is simple, efficient, yet accurate. On Sports1M, Kinetics, and Something-Something, our CSNs are comparable with or better than the state-of-the-art while being 2-3 times more efficient.

Results

TaskDatasetMetricValueModel
VideoKinetics-400Acc@182.6ir-CSN-152 (IG-65M pretraining)
VideoKinetics-400Acc@182.5ip-CSN-152 (IG-65M pretraining)
VideoKinetics-400Acc@595.3ip-CSN-152 (IG-65M pretraining)
VideoKinetics-400Acc@181.3R[2+1]D-152 (IG-65M pretraining)
VideoKinetics-400Acc@595.1R[2+1]D-152 (IG-65M pretraining)
VideoKinetics-400Acc@179.2ip-CSN-152 (Sports-1M pretraining)
VideoKinetics-400Acc@593.8ip-CSN-152 (Sports-1M pretraining)
VideoKinetics-400Acc@177.8ip-CSN-152
VideoKinetics-400Acc@592.8ip-CSN-152
Activity RecognitionSports-1MVideo hit@1 75.5ip-CSN-152 (RGB)
Activity RecognitionSports-1MVideo hit@592.8ip-CSN-152 (RGB)
Activity RecognitionSports-1MVideo hit@1 74.9ip-CSN-101 (RGB)
Activity RecognitionSports-1MVideo hit@592.6ip-CSN-101 (RGB)
Activity RecognitionSomething-Something V1Top 1 Accuracy53.3ip-CSN-152 (IG-65M pretraining)
Activity RecognitionSomething-Something V1Top 1 Accuracy52.1ir-CSN-152 (IG-65M pretraining)
Activity RecognitionSomething-Something V1Top 1 Accuracy51.6R(2+1)D-152 (IG-65M pretraining)
Activity RecognitionSomething-Something V1Top 1 Accuracy49.3ir-CSN-152
Activity RecognitionSomething-Something V1Top 1 Accuracy48.4ir-CSN-101
Action RecognitionSports-1MVideo hit@1 75.5ip-CSN-152 (RGB)
Action RecognitionSports-1MVideo hit@592.8ip-CSN-152 (RGB)
Action RecognitionSports-1MVideo hit@1 74.9ip-CSN-101 (RGB)
Action RecognitionSports-1MVideo hit@592.6ip-CSN-101 (RGB)
Action RecognitionSomething-Something V1Top 1 Accuracy53.3ip-CSN-152 (IG-65M pretraining)
Action RecognitionSomething-Something V1Top 1 Accuracy52.1ir-CSN-152 (IG-65M pretraining)
Action RecognitionSomething-Something V1Top 1 Accuracy51.6R(2+1)D-152 (IG-65M pretraining)
Action RecognitionSomething-Something V1Top 1 Accuracy49.3ir-CSN-152
Action RecognitionSomething-Something V1Top 1 Accuracy48.4ir-CSN-101

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks2025-07-14