Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Audio
/
Audio Classification
/
VGGSound
Audio Classification on VGGSound
Metric: Top 1 Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
Top 1 Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
Mirasol3B
69.8
No
Mirasol3B: A Multimodal Autoregressive model for...
2023-11-09
-
2
CA2ST(B/16)
68.3
No
CA^2ST: Cross-Attention in Audio, Space, and Tim...
2025-03-30
-
3
ONE-PEACE (Audio-Visual)
68.2
Yes
ONE-PEACE: Exploring One General Representation ...
2023-05-18
Code
4
CAVA(B/16)
68.2
No
CA^2ST: Cross-Attention in Audio, Space, and Tim...
2025-03-30
-
5
MAViL
67.1
Yes
-
-
-
6
EquiAV
67.1
Yes
EquiAV: Leveraging Equivariance for Audio-Visual...
2024-03-14
Code
7
MMT (Audio-Visual)
66.2
No
-
-
-
8
CAV-MAE (Audio-Visual)
65.9
Yes
Contrastive Audio-Visual Masked Autoencoder
2022-10-02
Code
9
UAVM (Audio + Video)
65.8
Yes
UAVM: Towards Unifying Audio and Visual Models
2022-07-29
Code
10
Audiovisual Masked Autoencoder (Audiovisual, Single)
65
No
Audiovisual Masked Autoencoders
2022-12-09
Code
11
AVT (Audio-Visual)
63.9
No
-
-
-
12
ONE-PEACE (Audio-Only)
59.6
Yes
ONE-PEACE: Exploring One General Representation ...
2023-05-18
Code
13
CAV-MAE (Audio-Only)
59.5
Yes
Contrastive Audio-Visual Masked Autoencoder
2022-10-02
Code
14
Audiovisual Masked Autoencoder (Audio-only, Single)
57.2
No
Audiovisual Masked Autoencoders
2022-12-09
Code
15
MAST (Audio Only)
57
No
Multiscale Audio Spectrogram Transformer for Eff...
2023-03-19
-
16
UAVM (Audio Only)
56.5
Yes
UAVM: Towards Unifying Audio and Visual Models
2022-07-29
Code
17
MMT (Video)
56.1
No
-
-
-
18
PlayItBackX3
53.7
No
Play It Back: Iterative Attention for Audio Reco...
2022-10-20
Code
19
AVT (V)
53.2
No
-
-
-
20
MBT (A)
52.3
No
Attention Bottlenecks for Multimodal Fusion
2021-06-30
Code
21
MBT (V)
51.2
No
Attention Bottlenecks for Multimodal Fusion
2021-06-30
Code
22
UAVM (Video Only)
49.9
Yes
UAVM: Towards Unifying Audio and Visual Models
2022-07-29
Code