Multimodal(ViT+BERT, Input: Image + Body)
Reported on 2 benchmarks across 2 tasks · 1 paper · 2 SOTA
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Natural Language Processing2 results
- Accuracy· 2021-08-30SOTA0.9249
- Accuracy· 2021-08-30SOTA0.9249