Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Methodology
/
Classification
/
ESC-50
Classification on ESC-50
Metric: Top-1 Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide augmentations
Export CSV
Sort:
Top-1 Accuracy (best first)
Top-1 Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Top-1 Accuracy
▼
Augmentations
Paper
Date
↕
Code
1
OmniVec2
99.1
Yes
-
-
-
2
InternVideo2
98.6
Yes
InternVideo2: Scaling Foundation Models for Mult...
2024-03-22
Code
3
M2D2 AS+
98.5
Yes
M2D2: Exploring General-purpose Audio-Language R...
2025-03-28
Code
4
OmniVec
98.4
Yes
OmniVec: Learning robust representations with cr...
2023-11-07
-
5
BEATs
98.1
Yes
BEATs: Audio Pre-Training with Acoustic Tokenizers
2022-12-18
Code
6
mn40_as
97.45
Yes
Efficient Large-scale Audio Tagging via Transfor...
2022-11-09
Code
7
DyMN-L
97.4
Yes
Dynamic Convolutional Neural Networks as Efficie...
2023-10-24
Code
8
M2D-CLAP/0.7
97.4
Yes
M2D-CLAP: Masked Modeling Duo Meets CLAP for Lea...
2024-06-04
Code
9
M2D-AS/0.7
97.2
Yes
Masked Modeling Duo: Towards a Universal Audio P...
2024-04-09
Code
10
HTS-AT
97
Yes
HTS-AT: A Hierarchical Token-Semantic Audio Tran...
2022-02-02
Code
11
EAT-M
96.3
Yes
End-to-End Audio Strikes Back: Boosting Augmenta...
2022-04-25
Code
12
LHGNN
96.2
No
LHGNN: Local-Higher Order Graph Neural Networks ...
2025-01-07
-
13
ERANN-2-5
96.1
No
-
-
-
14
M2D/0.7
96
Yes
Masked Modeling Duo: Towards a Universal Audio P...
2024-04-09
Code
15
EAT
96
Yes
EAT: Self-Supervised Pre-Training with Efficient...
2024-01-07
Code
16
Audio Spectrogram Transformer
95.7
Yes
AST: Audio Spectrogram Transformer
2021-04-05
Code
17
EAT-S
95.25
Yes
End-to-End Audio Strikes Back: Boosting Augmenta...
2022-04-25
Code
18
MATPAC (SSL model, linear eval)
93.5
No
Masked Latent Prediction and Classification for ...
2025-02-17
Code
19
EAT-S (scratch)
92.15
No
End-to-End Audio Strikes Back: Boosting Augmenta...
2022-04-25
Code
20
SepTr + LeRaC
91.58
No
Learning Rate Curriculum
2022-05-18
Code
21
SepTr
91.13
No
SepTr: Separable Transformer for Audio Spectrogr...
2022-03-17
Code
22
Multi-Format Contrastive
90.5
Yes
Multi-Format Contrastive Learning of Audio Repre...
2021-03-11
-
23
Multi-Channel Audio Feature with CNN
89.5
No
-
-
-
24
AVID
89.2
No
Audio-Visual Instance Discrimination with Cross-...
2020-04-27
Code
25
ACDNet
87.1
No
Environmental Sound Classification on the Edge: ...
2021-03-05
Code
26
XDC
85.4
No
Self-Supervised Learning by Cross-Modal Audio-Vi...
2019-11-28
Code
27
XDC
84.8
No
Self-Supervised Learning by Cross-Modal Audio-Vi...
2019-11-28
Code
28
AVTS
82.3
No
Cooperative Learning of Audio and Video Models f...
2018-06-30
-
29
L3
79.3
No
Look, Listen and Learn
2017-05-23
Code
#1
OmniVec2
99.1
Top-1 Accuracy
· Augmentations
No paper
#2
InternVideo2
SOTA
98.6
Top-1 Accuracy
· Augmentations
· 2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
Code
#3
M2D2 AS+
98.5
Top-1 Accuracy
· Augmentations
· 2025-03-28
M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP
Code
#4
OmniVec
SOTA
98.4
Top-1 Accuracy
· Augmentations
· 2023-11-07
OmniVec: Learning robust representations with cross modal sharing
#5
BEATs
SOTA
98.1
Top-1 Accuracy
· Augmentations
· 2022-12-18
BEATs: Audio Pre-Training with Acoustic Tokenizers
Code
#6
mn40_as
SOTA
97.45
Top-1 Accuracy
· Augmentations
· 2022-11-09
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Code
#7
DyMN-L
97.4
Top-1 Accuracy
· Augmentations
· 2023-10-24
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
Code
#8
M2D-CLAP/0.7
97.4
Top-1 Accuracy
· Augmentations
· 2024-06-04
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
Code
#9
M2D-AS/0.7
97.2
Top-1 Accuracy
· Augmentations
· 2024-04-09
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Code
#10
HTS-AT
SOTA
97
Top-1 Accuracy
· Augmentations
· 2022-02-02
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Code
#11
EAT-M
96.3
Top-1 Accuracy
· Augmentations
· 2022-04-25
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network
Code
#12
LHGNN
96.2
Top-1 Accuracy
· 2025-01-07
LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging
#13
ERANN-2-5
96.1
Top-1 Accuracy
No paper
#14
M2D/0.7
96
Top-1 Accuracy
· Augmentations
· 2024-04-09
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Code
#15
EAT
96
Top-1 Accuracy
· Augmentations
· 2024-01-07
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Code
#16
Audio Spectrogram Transformer
SOTA
95.7
Top-1 Accuracy
· Augmentations
· 2021-04-05
AST: Audio Spectrogram Transformer
Code
#17
EAT-S
95.25
Top-1 Accuracy
· Augmentations
· 2022-04-25
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network
Code
#18
MATPAC (SSL model, linear eval)
93.5
Top-1 Accuracy
· 2025-02-17
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
Code
#19
EAT-S (scratch)
92.15
Top-1 Accuracy
· 2022-04-25
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network
Code
#20
SepTr + LeRaC
91.58
Top-1 Accuracy
· 2022-05-18
Learning Rate Curriculum
Code
#21
SepTr
91.13
Top-1 Accuracy
· 2022-03-17
SepTr: Separable Transformer for Audio Spectrogram Processing
Code
#22
Multi-Format Contrastive
SOTA
90.5
Top-1 Accuracy
· Augmentations
· 2021-03-11
Multi-Format Contrastive Learning of Audio Representations
#23
Multi-Channel Audio Feature with CNN
89.5
Top-1 Accuracy
No paper
#24
AVID
SOTA
89.2
Top-1 Accuracy
· 2020-04-27
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Code
#25
ACDNet
87.1
Top-1 Accuracy
· 2021-03-05
Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices
Code
#26
XDC
SOTA
85.4
Top-1 Accuracy
· 2019-11-28
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Code
#27
XDC
84.8
Top-1 Accuracy
· 2019-11-28
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Code
#28
AVTS
SOTA
82.3
Top-1 Accuracy
· 2018-06-30
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization
#29
L3
SOTA
79.3
Top-1 Accuracy
· 2017-05-23
Look, Listen and Learn
Code