TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Audio/Audio Classification/AudioSet

Audio Classification on AudioSet

Metric: Test mAP (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Test mAP▼Extra DataPaperDate↕Code
1OmniVec20.558Yes---
2OmniVec0.548YesOmniVec: Learning robust representations with cr...2023-11-07-
3EquiAV0.546NoEquiAV: Leveraging Equivariance for Audio-Visual...2024-03-14Code
4MAViL (Audio-Visual, single)0.533Yes---
5Audiovisual Masked Autoencoder (Audiovisual, Single)0.518NoAudiovisual Masked Autoencoders2022-12-09Code
6CAV-MAE (Audio-Visual)0.512YesContrastive Audio-Visual Masked Autoencoder2022-10-02Code
7BEATs (Audio-only, Ensemble)0.506NoBEATs: Audio Pre-Training with Acoustic Tokenizers2022-12-18Code
8UAVM (Audio + Video)0.504YesUAVM: Towards Unifying Audio and Visual Models2022-07-29Code
9SSLAM (Audio-Only, Single)0.502NoSSLAM: Enhancing Self-Supervised Models with Aud...2025-06-13Code
10mn40_as (Ensemble)0.498YesEfficient Large-scale Audio Tagging via Transfor...2022-11-09Code
11ATST-C2F(Single)0.497NoSelf-supervised Audio Teacher-Student Transforme...2023-06-07Code
12MBT (AS-500K training + Video)0.496YesAttention Bottlenecks for Multimodal Fusion2021-06-30Code
13PaSST (Ensemble)0.496YesEfficient Training of Audio Transformers with Pa...2021-10-11Code
14DyMN-L (Audio-Only, Single)0.49YesDynamic Convolutional Neural Networks as Efficie...2023-10-24Code
15M2D20.49NoM2D2: Exploring General-purpose Audio-Language R...2025-03-28Code
16HTS-AT (Ensemble)0.487YesHTS-AT: A Hierarchical Token-Semantic Audio Tran...2022-02-02Code
17BEATs (Audio-only, Single)0.486NoBEATs: Audio Pre-Training with Acoustic Tokenizers2022-12-18Code
18EAT0.486NoEAT: Self-Supervised Pre-Training with Efficient...2024-01-07Code
19DTF-AT (Single)0.486No--Code
20AST (Ensemble)0.485YesAST: Audio Spectrogram Transformer2021-04-05Code
21M2D-CLAP/0.70.485NoM2D-CLAP: Masked Modeling Duo Meets CLAP for Lea...2024-06-04Code
22M2D-AS/0.70.485NoMasked Modeling Duo: Towards a Universal Audio P...2024-04-09Code
23MAViL (Audio-only, single)0.484Yes---
24mn40_as (Single)0.483YesEfficient Large-scale Audio Tagging via Transfor...2022-11-09Code
25MAX-AST (Single)0.481No--Code
26ATST-Frame0.48NoSelf-supervised Audio Teacher-Student Transforme...2023-06-07Code
27M2D/0.70.479NoMasked Modeling Duo: Towards a Universal Audio P...2024-04-09Code
28PlayItBackX30.477NoPlay It Back: Iterative Attention for Audio Reco...2022-10-20Code
29DASS-Medium (Audio-only, single)0.476NoDASS: Distilled Audio State Space Models Are Str...2024-07-04Code
30PSLA (Ensemble)0.474YesPSLA: Improving Audio Tagging with Pretraining, ...2021-02-02Code
31DASS-Small (Audio-only, single)0.472NoDASS: Distilled Audio State Space Models Are Str...2024-07-04Code
32PaSST-S (Single)0.471YesEfficient Training of Audio Transformers with Pa...2021-10-11Code
33MaskSpec (AS-2M)0.471No---
34CAV-MAE (Audio-Only)0.466YesContrastive Audio-Visual Masked Autoencoder2022-10-02Code
35Audiovisual Masked Autoencoder (Audio-only, Single)0.466NoAudiovisual Masked Autoencoders2022-12-09Code
36AudioVisual Fusion Net0.462NoLarge Scale Audiovisual Learning of Sounds with ...2020-05-29-
37AST (Single)0.459YesAST: Audio Spectrogram Transformer2021-04-05Code
38ERANN-1-60.45No---
39Perceiver0.449NoPerceiver: General Perception with Iterative Att...2021-03-04Code
40PSLA (Single)0.443YesPSLA: Improving Audio Tagging with Pretraining, ...2021-02-02Code
41PANNs-CNN14 (Single)0.431No--Code
42EAT-M0.426NoEnd-to-End Audio Strikes Back: Boosting Augmenta...2022-04-25Code
43Conformer (AS-2M)0.411NoConformer-Based Self-Supervised Learning for Non...2021-10-14-
44EAT-S0.405NoEnd-to-End Audio Strikes Back: Boosting Augmenta...2022-04-25Code
45WEANet-SUSTAIN0.398NoA Sequential Self Teaching Approach for Improvin...2020-06-30-
46VATT-Base0.394YesVATT: Transformers for Multimodal Self-Supervise...2021-04-22Code
47Multi-Format Contrastive0.376NoMulti-Format Contrastive Learning of Audio Repre...2021-03-11-
48MMV0.309NoSelf-Supervised MultiModal Versatile Networks2020-06-29Code
49CAV-MAE (Visual-Only)0.262YesContrastive Audio-Visual Masked Autoencoder2022-10-02Code
50L30.249NoLook, Listen and Learn2017-05-23Code
51Triplet0.244NoUnsupervised Learning of Semantic Audio Represen...2017-11-06-