Video on Kinetics-Sounds

Metric: Top 1 Accuracy (higher is better)

LeaderboardDataset
Loading chart...
#ModelTop 1 AccuracyExtra DataPaperDateCode
1CA2ST(B/16)93.3NoCA^2ST: Cross-Attention in Audio, Space, and Tim...2025-03-30-
2CAVA(B/16)92.9NoCA^2ST: Cross-Attention in Audio, Space, and Tim...2025-03-30-
3Mirasol3B90.1NoMirasol3B: A Multimodal Autoregressive model for...2023-11-09-
4MBT (AV)85NoAttention Bottlenecks for Multimodal Fusion2021-06-30Code