Joined Model Multi-tasking
Reported on 2 benchmarks across 1 task
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Natural Language Processing2 results
- 44.82best: 62.27 (Llama-3.3-70B + CAPO)
- 54.72best: 97.5 (T5-11B)