Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Methodology
/
Classification
/
ESC-50
Classification on ESC-50
Metric: Accuracy (5-fold) (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide augmentations
Export CSV
Sort:
Accuracy (5-fold) (best first)
Accuracy (5-fold) (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy (5-fold)
▼
Augmentations
Paper
Date
↕
Code
1
OmniVec2
99.1
Yes
-
-
-
2
InternVideo2
98.6
Yes
InternVideo2: Scaling Foundation Models for Mult...
2024-03-22
Code
3
M2D2 AS+
98.5
Yes
M2D2: Exploring General-purpose Audio-Language R...
2025-03-28
Code
4
OmniVec
98.4
Yes
OmniVec: Learning robust representations with cr...
2023-11-07
-
5
BEATs
98.1
Yes
BEATs: Audio Pre-Training with Acoustic Tokenizers
2022-12-18
Code
6
mn40_as
97.45
Yes
Efficient Large-scale Audio Tagging via Transfor...
2022-11-09
Code
7
DyMN-L
97.4
Yes
Dynamic Convolutional Neural Networks as Efficie...
2023-10-24
Code
8
M2D-CLAP/0.7
97.4
Yes
M2D-CLAP: Masked Modeling Duo Meets CLAP for Lea...
2024-06-04
Code
9
M2D-AS/0.7
97.2
Yes
Masked Modeling Duo: Towards a Universal Audio P...
2024-04-09
Code
10
HTS-AT
97
Yes
HTS-AT: A Hierarchical Token-Semantic Audio Tran...
2022-02-02
Code
11
EAT-M
96.3
Yes
End-to-End Audio Strikes Back: Boosting Augmenta...
2022-04-25
Code
12
ERANN-2-5
96.1
No
-
-
-
13
M2D/0.7
96
Yes
Masked Modeling Duo: Towards a Universal Audio P...
2024-04-09
Code
14
EAT
96
Yes
EAT: Self-Supervised Pre-Training with Efficient...
2024-01-07
Code
15
Audio Spectrogram Transformer
95.7
Yes
AST: Audio Spectrogram Transformer
2021-04-05
Code
16
EAT-S
95.25
Yes
End-to-End Audio Strikes Back: Boosting Augmenta...
2022-04-25
Code
17
MATPAC (SSL model, linear eval)
93.5
No
Masked Latent Prediction and Classification for ...
2025-02-17
Code
18
EAT-S (scratch)
92.15
No
End-to-End Audio Strikes Back: Boosting Augmenta...
2022-04-25
Code
19
SepTr + LeRaC
91.58
No
Learning Rate Curriculum
2022-05-18
Code
20
Multi-Channel Audio Feature with CNN
89.5
No
-
-
-
21
ACDNet
87.1
No
Environmental Sound Classification on the Edge: ...
2021-03-05
Code
#1
OmniVec2
99.1
Accuracy (5-fold)
· Augmentations
No paper
#2
InternVideo2
SOTA
98.6
Accuracy (5-fold)
· Augmentations
· 2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
Code
#3
M2D2 AS+
98.5
Accuracy (5-fold)
· Augmentations
· 2025-03-28
M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP
Code
#4
OmniVec
SOTA
98.4
Accuracy (5-fold)
· Augmentations
· 2023-11-07
OmniVec: Learning robust representations with cross modal sharing
#5
BEATs
SOTA
98.1
Accuracy (5-fold)
· Augmentations
· 2022-12-18
BEATs: Audio Pre-Training with Acoustic Tokenizers
Code
#6
mn40_as
SOTA
97.45
Accuracy (5-fold)
· Augmentations
· 2022-11-09
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Code
#7
DyMN-L
97.4
Accuracy (5-fold)
· Augmentations
· 2023-10-24
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
Code
#8
M2D-CLAP/0.7
97.4
Accuracy (5-fold)
· Augmentations
· 2024-06-04
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
Code
#9
M2D-AS/0.7
97.2
Accuracy (5-fold)
· Augmentations
· 2024-04-09
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Code
#10
HTS-AT
SOTA
97
Accuracy (5-fold)
· Augmentations
· 2022-02-02
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Code
#11
EAT-M
96.3
Accuracy (5-fold)
· Augmentations
· 2022-04-25
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network
Code
#12
ERANN-2-5
96.1
Accuracy (5-fold)
No paper
#13
M2D/0.7
96
Accuracy (5-fold)
· Augmentations
· 2024-04-09
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Code
#14
EAT
96
Accuracy (5-fold)
· Augmentations
· 2024-01-07
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Code
#15
Audio Spectrogram Transformer
SOTA
95.7
Accuracy (5-fold)
· Augmentations
· 2021-04-05
AST: Audio Spectrogram Transformer
Code
#16
EAT-S
95.25
Accuracy (5-fold)
· Augmentations
· 2022-04-25
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network
Code
#17
MATPAC (SSL model, linear eval)
93.5
Accuracy (5-fold)
· 2025-02-17
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
Code
#18
EAT-S (scratch)
92.15
Accuracy (5-fold)
· 2022-04-25
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network
Code
#19
SepTr + LeRaC
91.58
Accuracy (5-fold)
· 2022-05-18
Learning Rate Curriculum
Code
#20
Multi-Channel Audio Feature with CNN
89.5
Accuracy (5-fold)
No paper
#21
ACDNet
SOTA
87.1
Accuracy (5-fold)
· 2021-03-05
Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices
Code