Metric: mean average precision (higher is better)
| # | Model↕ | mean average precision▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | CAV-MAE (Audio-Visual) | 0.512 | Yes | Contrastive Audio-Visual Masked Autoencoder | 2022-10-02 | Code |
| 2 | mn40_as (Ensemble) | 0.498 | Yes | Efficient Large-scale Audio Tagging via Transfor... | 2022-11-09 | Code |
| 3 | PaSST | 0.496 | Yes | Efficient Training of Audio Transformers with Pa... | 2021-10-11 | Code |
| 4 | DyMN-L (Audio-Only, Single) | 0.49 | Yes | Dynamic Convolutional Neural Networks as Efficie... | 2023-10-24 | Code |
| 5 | Audio Spectrogram Transformer | 0.485 | Yes | AST: Audio Spectrogram Transformer | 2021-04-05 | Code |
| 6 | mn40_as (Single) | 0.483 | Yes | Efficient Large-scale Audio Tagging via Transfor... | 2022-11-09 | Code |
| 7 | PSLA | 0.474 | Yes | PSLA: Improving Audio Tagging with Pretraining, ... | 2021-02-02 | Code |
| 8 | ST-SED | 0.467 | Yes | Zero-shot Audio Source Separation through Query-... | 2021-12-15 | Code |
| 9 | CAV-MAE (Audio-Only) | 0.466 | Yes | Contrastive Audio-Visual Masked Autoencoder | 2022-10-02 | Code |
| 10 | ERANN-1-6 | 0.45 | No | - | - | - |
| 11 | CNN14 | 0.431 | No | - | - | Code |