Video on Kinetics-Sounds
Metric: Top 1 Accuracy (higher is better)
LeaderboardDataset
Loading chart...
Results
Submit a result| # | Model↕ | Top 1 Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | CA2ST(B/16) | 93.3 | No | CA^2ST: Cross-Attention in Audio, Space, and Tim... | 2025-03-30 | - |
| 2 | CAVA(B/16) | 92.9 | No | CA^2ST: Cross-Attention in Audio, Space, and Tim... | 2025-03-30 | - |
| 3 | Mirasol3B | 90.1 | No | Mirasol3B: A Multimodal Autoregressive model for... | 2023-11-09 | - |
| 4 | MBT (AV) | 85 | No | Attention Bottlenecks for Multimodal Fusion | 2021-06-30 | Code |