TeachCLIP (ViT-B/16)

Reported on 6 benchmarks across 2 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision6 results

VideoonMSR-VTT-1kA
text-to-video R@1
48
best: 62.9 (HunYuan_tvr (huge))
VideoonMSR-VTT-1kA
text-to-video R@10
83.5
best: 90.8 (HunYuan_tvr (huge))
VideoonMSR-VTT-1kA
text-to-video R@5
75.9
best: 84.5 (HunYuan_tvr (huge))
Video RetrievalonMSR-VTT-1kA
text-to-video R@1
48
best: 62.9 (HunYuan_tvr (huge))
Video RetrievalonMSR-VTT-1kA
text-to-video R@10
83.5
best: 90.8 (HunYuan_tvr (huge))
Video RetrievalonMSR-VTT-1kA
text-to-video R@5
75.9
best: 84.5 (HunYuan_tvr (huge))