Metric: mAP (higher is better)
| # | Model↕ | mAP▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | DenseAV | 48.7 | No | Separating the "Chirp" from the "Chat": Self-sup... | 2024-06-09 | Code |
| 2 | DenseAV | 32.7 | No | Separating the "Chirp" from the "Chat": Self-sup... | 2024-06-09 | Code |
| 3 | DAVENet | 32.2 | No | Jointly Discovering Visual Objects and Spoken Wo... | 2018-04-04 | - |
| 4 | CAVMAE | 27.2 | No | Contrastive Audio-Visual Masked Autoencoder | 2022-10-02 | Code |
| 5 | CAVMAE | 26 | No | Contrastive Audio-Visual Masked Autoencoder | 2022-10-02 | Code |
| 6 | ImageBIND | 20.2 | No | ImageBind: One Embedding Space To Bind Them All | 2023-05-09 | Code |
| 7 | ImageBIND | 19.7 | No | ImageBind: One Embedding Space To Bind Them All | 2023-05-09 | Code |
| 8 | DAVENet | 16.8 | No | Jointly Discovering Visual Objects and Spoken Wo... | 2018-04-04 | - |