VT-Transformer (CAT)
Reported on 2 benchmarks across 1 task
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Natural Language Processing2 results
- average_precision74.91best: 84.13 (CLIP-Ensemble)
- 66.7best: 67.26 (VT-Transformer (MUL))