Metric: Mean AP (higher is better)
| # | Model↕ | Mean AP▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | EquiAV | 42.4 | No | EquiAV: Leveraging Equivariance for Audio-Visual... | 2024-03-14 | Code |
| 2 | SSLAM | 40.9 | No | SSLAM: Enhancing Self-Supervised Models with Aud... | 2025-06-13 | Code |
| 3 | EAT | 40.3 | No | EAT: Self-Supervised Pre-Training with Efficient... | 2024-01-07 | Code |
| 4 | BEATs | 38.9 | No | BEATs: Audio Pre-Training with Acoustic Tokenizers | 2022-12-18 | Code |
| 5 | Base (ours) | 37.4 | No | ATST: Audio Representation Learning with Teacher... | 2022-04-26 | Code |
| 6 | SSAST-PATCH | 31 | No | SSAST: Self-Supervised Audio Spectrogram Transfo... | 2021-10-19 | Code |
| 7 | SSAST-FRAME | 29.2 | No | SSAST: Self-Supervised Audio Spectrogram Transfo... | 2021-10-19 | Code |
| 8 | Conformer | 27.6 | No | Conformer-Based Self-Supervised Learning for Non... | 2021-10-14 | - |