Metric: Top 5 Accuracy (higher is better)
| # | Model↕ | Top 5 Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | MMT (Audio-Visual) | 85.7 | No | - | - | - |
| 2 | MBT (AV) | 85.6 | No | Attention Bottlenecks for Multimodal Fusion | 2021-06-30 | Code |
| 3 | AVT (Audio-Visual) | 85 | No | - | - | - |
| 4 | MAST (Audio Only) | 81.3 | No | Multiscale Audio Spectrogram Transformer for Eff... | 2023-03-19 | - |
| 5 | PlayItBackX3 | 79.2 | No | Play It Back: Iterative Attention for Audio Reco... | 2022-10-20 | Code |
| 6 | MBT (A) | 78.1 | No | Attention Bottlenecks for Multimodal Fusion | 2021-06-30 | Code |
| 7 | MMT (Video) | 77.9 | No | - | - | - |
| 8 | AVT (V) | 74.8 | No | - | - | - |
| 9 | MBT (V) | 72.6 | No | Attention Bottlenecks for Multimodal Fusion | 2021-06-30 | Code |