TeachCLIP (ViT-B/16)
Reported on 6 benchmarks across 2 tasks
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Computer Vision6 results
- text-to-video R@148best: 62.9 (HunYuan_tvr (huge))
- text-to-video R@1083.5best: 90.8 (HunYuan_tvr (huge))
- text-to-video R@575.9best: 84.5 (HunYuan_tvr (huge))
- text-to-video R@148best: 62.9 (HunYuan_tvr (huge))
- text-to-video R@1083.5best: 90.8 (HunYuan_tvr (huge))
- text-to-video R@575.9best: 84.5 (HunYuan_tvr (huge))