TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video/Kinetics-700

Video on Kinetics-700

Metric: Top-1 Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Top-1 Accuracy▼Extra DataPaperDate↕Code
1InternVideo2-6B85.9YesInternVideo2: Scaling Foundation Models for Mult...2024-03-22Code
2InternVideo2-1B85.4YesInternVideo2: Scaling Foundation Models for Mult...2024-03-22Code
3InternVideo-T84YesInternVideo: General Video Foundation Models via...2022-12-06Code
4TubeViT-L83.8NoRethinking Video ViTs: Sparse Video Tubes for Jo...2022-12-06Code
5UMT-L (ViT-L/16)83.6YesUnmasked Teacher: Towards Training-Efficient Vid...2023-03-28Code
6MTV-H (WTS 60M)83.4YesMultiview Transformers for Video Recognition2022-01-12Code
7UniFormerV2-L82.7Yes--Code
8CoCa (finetuned)82.7YesCoCa: Contrastive Captioners are Image-Text Foun...2022-05-04Code
9CoCa (frozen)81.1YesCoCa: Contrastive Captioners are Image-Text Foun...2022-05-04Code
10Hiera-H (no extra data)81.1NoHiera: A Hierarchical Vision Transformer without...2023-06-01Code
11MaskFeat (no extra data, MViT-L)80.4NoMasked Feature Prediction for Self-Supervised Vi...2021-12-16Code
12mPLUG-280.4YesmPLUG-2: A Modularized Multi-modal Foundation Mo...2023-02-01Code
13AIM (CLIP ViT-L/14, 32x224)80.4YesAIM: Adapting Image Models for Efficient Video A...2023-02-06Code
14CoVeR (JFT-3B)79.8YesCo-training Transformer with Videos and Images I...2021-12-14-
15MViTv2-L (ImageNet-21k pretrain)79.4YesMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
16MoViNet-A679.4NoMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
17CoVeR (JFT-300M)78.5YesCo-training Transformer with Videos and Images I...2021-12-14-
18MViTv2-B76.6NoMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
19MoViNet-A672.3NoMoViNets: Mobile Video Networks for Efficient Vi...2021-03-21Code
20MoViNet-A571.7NoMoViNets: Mobile Video Networks for Efficient Vi...2021-03-21Code
21En-VidTr-L70.8NoVidTr: Video Transformer Without Convolutions2021-04-23-
22MoViNet-A470.7NoMoViNets: Mobile Video Networks for Efficient Vi...2021-03-21Code
23VidTr-L70.2NoVidTr: Video Transformer Without Convolutions2021-04-23-
24VidTr-M69.5NoVidTr: Video Transformer Without Convolutions2021-04-23-
25MoViNet-A368NoMoViNets: Mobile Video Networks for Efficient Vi...2021-03-21Code
26VidTr-S67.3NoVidTr: Video Transformer Without Convolutions2021-04-23-
27MoViNet-A266.7NoMoViNets: Mobile Video Networks for Efficient Vi...2021-03-21Code
28MoViNet-A163.5NoMoViNets: Mobile Video Networks for Efficient Vi...2021-03-21Code
29MoViNet-A058.5NoMoViNets: Mobile Video Networks for Efficient Vi...2021-03-21Code
30SRTG r3d-10156.46NoLearn to cycle: Time-consistent feature discover...2020-06-15Code
31SRTG r(2+1)d-5054.17NoLearn to cycle: Time-consistent feature discover...2020-06-15Code
32SRTG r3d-5053.52NoLearn to cycle: Time-consistent feature discover...2020-06-15Code
33SEER (RegNet10B)51.9YesVision Models Are More Robust And Fair When Pret...2022-02-16Code
34SRTG r(2+1)d-3449.43NoLearn to cycle: Time-consistent feature discover...2020-06-15Code
35SRTG r3d-3449.15NoLearn to cycle: Time-consistent feature discover...2020-06-15Code