Joined Model Multi-tasking

Reported on 2 benchmarks across 1 task

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing2 results

Sentiment AnalysisonSST-5 Fine-grained classification
Accuracy
44.82
best: 62.27 (Llama-3.3-70B + CAPO)
Sentiment AnalysisonSST-2 Binary classification
Accuracy
54.72
best: 97.5 (T5-11B)