VT-Transformer (MUL)
Reported on 2 benchmarks across 1 task
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Natural Language Processing2 results
- average_precision76.96best: 84.13 (CLIP-Ensemble)
- 67.26