A Short Note about Kinetics-600
Joao Carreira, Eric Noland, Andras Banki-Horvath, Chloe Hillier, Andrew Zisserman
2018-08-03Action Classification
Abstract
We describe an extension of the DeepMind Kinetics human action dataset from 400 classes, each with at least 400 video clips, to 600 classes, each with at least 600 video clips. In order to scale up the dataset we changed the data collection process so it uses multiple queries per class, with some of them in a language other than english -- portuguese. This paper details the changes between the two versions of the dataset and includes a comprehensive set of statistics of the new version as well as baseline results using the I3D neural network architecture. The paper is a companion to the release of the ground truth labels for the public test set.
Results
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | Kinetics-600 | Top-1 Accuracy | 73.6 | I3D (RGB) |
Related Papers
SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis2025-06-09From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos2025-06-05Spatio-Temporal Joint Density Driven Learning for Skeleton-Based Action Recognition2025-05-29SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding2025-05-22Mouse Lockbox Dataset: Behavior Recognition for Mice Solving Lockboxes2025-05-21Domain Adaptation of VLM for Soccer Video Understanding2025-05-20OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition2025-03-30CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition2025-03-30