Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Audio
/
Audio Classification
/
AudioSet
Audio Classification on AudioSet
Metric: Test mAP (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Test mAP (best first)
Test mAP (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Test mAP
▼
Extra Data
Paper
Date
↕
Code
1
OmniVec2
0.558
Yes
-
-
-
2
OmniVec
0.548
Yes
OmniVec: Learning robust representations with cr...
2023-11-07
-
3
EquiAV
0.546
No
EquiAV: Leveraging Equivariance for Audio-Visual...
2024-03-14
Code
4
MAViL (Audio-Visual, single)
0.533
Yes
-
-
-
5
Audiovisual Masked Autoencoder (Audiovisual, Single)
0.518
No
Audiovisual Masked Autoencoders
2022-12-09
Code
6
CAV-MAE (Audio-Visual)
0.512
Yes
Contrastive Audio-Visual Masked Autoencoder
2022-10-02
Code
7
BEATs (Audio-only, Ensemble)
0.506
No
BEATs: Audio Pre-Training with Acoustic Tokenizers
2022-12-18
Code
8
UAVM (Audio + Video)
0.504
Yes
UAVM: Towards Unifying Audio and Visual Models
2022-07-29
Code
9
SSLAM (Audio-Only, Single)
0.502
No
SSLAM: Enhancing Self-Supervised Models with Aud...
2025-06-13
Code
10
mn40_as (Ensemble)
0.498
Yes
Efficient Large-scale Audio Tagging via Transfor...
2022-11-09
Code
11
ATST-C2F(Single)
0.497
No
Self-supervised Audio Teacher-Student Transforme...
2023-06-07
Code
12
MBT (AS-500K training + Video)
0.496
Yes
Attention Bottlenecks for Multimodal Fusion
2021-06-30
Code
13
PaSST (Ensemble)
0.496
Yes
Efficient Training of Audio Transformers with Pa...
2021-10-11
Code
14
DyMN-L (Audio-Only, Single)
0.49
Yes
Dynamic Convolutional Neural Networks as Efficie...
2023-10-24
Code
15
M2D2
0.49
No
M2D2: Exploring General-purpose Audio-Language R...
2025-03-28
Code
16
HTS-AT (Ensemble)
0.487
Yes
HTS-AT: A Hierarchical Token-Semantic Audio Tran...
2022-02-02
Code
17
BEATs (Audio-only, Single)
0.486
No
BEATs: Audio Pre-Training with Acoustic Tokenizers
2022-12-18
Code
18
EAT
0.486
No
EAT: Self-Supervised Pre-Training with Efficient...
2024-01-07
Code
19
DTF-AT (Single)
0.486
No
-
-
Code
20
AST (Ensemble)
0.485
Yes
AST: Audio Spectrogram Transformer
2021-04-05
Code
21
M2D-CLAP/0.7
0.485
No
M2D-CLAP: Masked Modeling Duo Meets CLAP for Lea...
2024-06-04
Code
22
M2D-AS/0.7
0.485
No
Masked Modeling Duo: Towards a Universal Audio P...
2024-04-09
Code
23
MAViL (Audio-only, single)
0.484
Yes
-
-
-
24
mn40_as (Single)
0.483
Yes
Efficient Large-scale Audio Tagging via Transfor...
2022-11-09
Code
25
MAX-AST (Single)
0.481
No
-
-
Code
26
ATST-Frame
0.48
No
Self-supervised Audio Teacher-Student Transforme...
2023-06-07
Code
27
M2D/0.7
0.479
No
Masked Modeling Duo: Towards a Universal Audio P...
2024-04-09
Code
28
PlayItBackX3
0.477
No
Play It Back: Iterative Attention for Audio Reco...
2022-10-20
Code
29
DASS-Medium (Audio-only, single)
0.476
No
DASS: Distilled Audio State Space Models Are Str...
2024-07-04
Code
30
PSLA (Ensemble)
0.474
Yes
PSLA: Improving Audio Tagging with Pretraining, ...
2021-02-02
Code
31
DASS-Small (Audio-only, single)
0.472
No
DASS: Distilled Audio State Space Models Are Str...
2024-07-04
Code
32
PaSST-S (Single)
0.471
Yes
Efficient Training of Audio Transformers with Pa...
2021-10-11
Code
33
MaskSpec (AS-2M)
0.471
No
-
-
-
34
CAV-MAE (Audio-Only)
0.466
Yes
Contrastive Audio-Visual Masked Autoencoder
2022-10-02
Code
35
Audiovisual Masked Autoencoder (Audio-only, Single)
0.466
No
Audiovisual Masked Autoencoders
2022-12-09
Code
36
AudioVisual Fusion Net
0.462
No
Large Scale Audiovisual Learning of Sounds with ...
2020-05-29
-
37
AST (Single)
0.459
Yes
AST: Audio Spectrogram Transformer
2021-04-05
Code
38
ERANN-1-6
0.45
No
-
-
-
39
Perceiver
0.449
No
Perceiver: General Perception with Iterative Att...
2021-03-04
Code
40
PSLA (Single)
0.443
Yes
PSLA: Improving Audio Tagging with Pretraining, ...
2021-02-02
Code
41
PANNs-CNN14 (Single)
0.431
No
-
-
Code
42
EAT-M
0.426
No
End-to-End Audio Strikes Back: Boosting Augmenta...
2022-04-25
Code
43
Conformer (AS-2M)
0.411
No
Conformer-Based Self-Supervised Learning for Non...
2021-10-14
-
44
EAT-S
0.405
No
End-to-End Audio Strikes Back: Boosting Augmenta...
2022-04-25
Code
45
WEANet-SUSTAIN
0.398
No
A Sequential Self Teaching Approach for Improvin...
2020-06-30
-
46
VATT-Base
0.394
Yes
VATT: Transformers for Multimodal Self-Supervise...
2021-04-22
Code
47
Multi-Format Contrastive
0.376
No
Multi-Format Contrastive Learning of Audio Repre...
2021-03-11
-
48
MMV
0.309
No
Self-Supervised MultiModal Versatile Networks
2020-06-29
Code
49
CAV-MAE (Visual-Only)
0.262
Yes
Contrastive Audio-Visual Masked Autoencoder
2022-10-02
Code
50
L3
0.249
No
Look, Listen and Learn
2017-05-23
Code
51
Triplet
0.244
No
Unsupervised Learning of Semantic Audio Represen...
2017-11-06
-
#1
OmniVec2
0.558
Test mAP
· Extra Data
No paper
#2
OmniVec
SOTA
0.548
Test mAP
· Extra Data
· 2023-11-07
OmniVec: Learning robust representations with cross modal sharing
#3
EquiAV
0.546
Test mAP
· 2024-03-14
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
Code
#4
MAViL (Audio-Visual, single)
0.533
Test mAP
· Extra Data
No paper
#5
Audiovisual Masked Autoencoder (Audiovisual, Single)
SOTA
0.518
Test mAP
· 2022-12-09
Audiovisual Masked Autoencoders
Code
#6
CAV-MAE (Audio-Visual)
SOTA
0.512
Test mAP
· Extra Data
· 2022-10-02
Contrastive Audio-Visual Masked Autoencoder
Code
#7
BEATs (Audio-only, Ensemble)
0.506
Test mAP
· 2022-12-18
BEATs: Audio Pre-Training with Acoustic Tokenizers
Code
#8
UAVM (Audio + Video)
SOTA
0.504
Test mAP
· Extra Data
· 2022-07-29
UAVM: Towards Unifying Audio and Visual Models
Code
#9
SSLAM (Audio-Only, Single)
0.502
Test mAP
· 2025-06-13
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Code
#10
mn40_as (Ensemble)
0.498
Test mAP
· Extra Data
· 2022-11-09
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Code
#11
ATST-C2F(Single)
0.497
Test mAP
· 2023-06-07
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Code
#12
MBT (AS-500K training + Video)
SOTA
0.496
Test mAP
· Extra Data
· 2021-06-30
Attention Bottlenecks for Multimodal Fusion
Code
#13
PaSST (Ensemble)
0.496
Test mAP
· Extra Data
· 2021-10-11
Efficient Training of Audio Transformers with Patchout
Code
#14
DyMN-L (Audio-Only, Single)
0.49
Test mAP
· Extra Data
· 2023-10-24
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
Code
#15
M2D2
0.49
Test mAP
· 2025-03-28
M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP
Code
#16
HTS-AT (Ensemble)
0.487
Test mAP
· Extra Data
· 2022-02-02
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Code
#17
BEATs (Audio-only, Single)
0.486
Test mAP
· 2022-12-18
BEATs: Audio Pre-Training with Acoustic Tokenizers
Code
#18
EAT
0.486
Test mAP
· 2024-01-07
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Code
#19
DTF-AT (Single)
0.486
Test mAP
No paper
Code
#20
AST (Ensemble)
SOTA
0.485
Test mAP
· Extra Data
· 2021-04-05
AST: Audio Spectrogram Transformer
Code
#21
M2D-CLAP/0.7
0.485
Test mAP
· 2024-06-04
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
Code
#22
M2D-AS/0.7
0.485
Test mAP
· 2024-04-09
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Code
#23
MAViL (Audio-only, single)
0.484
Test mAP
· Extra Data
No paper
#24
mn40_as (Single)
0.483
Test mAP
· Extra Data
· 2022-11-09
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Code
#25
MAX-AST (Single)
0.481
Test mAP
No paper
Code
#26
ATST-Frame
0.48
Test mAP
· 2023-06-07
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Code
#27
M2D/0.7
0.479
Test mAP
· 2024-04-09
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Code
#28
PlayItBackX3
0.477
Test mAP
· 2022-10-20
Play It Back: Iterative Attention for Audio Recognition
Code
#29
DASS-Medium (Audio-only, single)
0.476
Test mAP
· 2024-07-04
DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners
Code
#30
PSLA (Ensemble)
SOTA
0.474
Test mAP
· Extra Data
· 2021-02-02
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
Code
#31
DASS-Small (Audio-only, single)
0.472
Test mAP
· 2024-07-04
DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners
Code
#32
PaSST-S (Single)
0.471
Test mAP
· Extra Data
· 2021-10-11
Efficient Training of Audio Transformers with Patchout
Code
#33
MaskSpec (AS-2M)
0.471
Test mAP
No paper
#34
CAV-MAE (Audio-Only)
0.466
Test mAP
· Extra Data
· 2022-10-02
Contrastive Audio-Visual Masked Autoencoder
Code
#35
Audiovisual Masked Autoencoder (Audio-only, Single)
0.466
Test mAP
· 2022-12-09
Audiovisual Masked Autoencoders
Code
#36
AudioVisual Fusion Net
SOTA
0.462
Test mAP
· 2020-05-29
Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data
#37
AST (Single)
0.459
Test mAP
· Extra Data
· 2021-04-05
AST: Audio Spectrogram Transformer
Code
#38
ERANN-1-6
0.45
Test mAP
No paper
#39
Perceiver
0.449
Test mAP
· 2021-03-04
Perceiver: General Perception with Iterative Attention
Code
#40
PSLA (Single)
0.443
Test mAP
· Extra Data
· 2021-02-02
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
Code
#41
PANNs-CNN14 (Single)
0.431
Test mAP
No paper
Code
#42
EAT-M
0.426
Test mAP
· 2022-04-25
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network
Code
#43
Conformer (AS-2M)
0.411
Test mAP
· 2021-10-14
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
#44
EAT-S
0.405
Test mAP
· 2022-04-25
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network
Code
#45
WEANet-SUSTAIN
0.398
Test mAP
· 2020-06-30
A Sequential Self Teaching Approach for Improving Generalization in Sound Event Recognition
#46
VATT-Base
0.394
Test mAP
· Extra Data
· 2021-04-22
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Code
#47
Multi-Format Contrastive
0.376
Test mAP
· 2021-03-11
Multi-Format Contrastive Learning of Audio Representations
#48
MMV
0.309
Test mAP
· 2020-06-29
Self-Supervised MultiModal Versatile Networks
Code
#49
CAV-MAE (Visual-Only)
0.262
Test mAP
· Extra Data
· 2022-10-02
Contrastive Audio-Visual Masked Autoencoder
Code
#50
L3
SOTA
0.249
Test mAP
· 2017-05-23
Look, Listen and Learn
Code
#51
Triplet
0.244
Test mAP
· 2017-11-06
Unsupervised Learning of Semantic Audio Representations