Zero-Shot Video Retrieval on YouCook2

Metric: text-to-video Mean Rank (higher is better)

LeaderboardDataset
Loading chart...
#Modeltext-to-video Mean RankExtra DataPaperDateCode
1VATT-MBS13NoVATT: Transformers for Multimodal Self-Supervise...2021-04-22Code
2MIL-NCE10NoEnd-to-End Learning of Visual Representations fr...2019-12-13Code
3TACo8NoTACo: Token-aware Cascade Contrastive Learning f...2021-08-23-