Llama-3-IT-8B-32k
Reported on 4 benchmarks across 2 tasks · 1 paper
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Natural Language Processing4 results
- AlignScore· 2024-07-310.1016best: 0.1378 (GPT-3.5-Turbo-0613-16k)
- Prometheus-2 Answer Correctness· 2024-07-313.1673best: 3.0408 (GPT-3.5-Turbo-0613-16k)
- Rouge-L· 2024-07-310.2286best: 0.2414 (GPT-3.5-Turbo-0613-16k)
- Macro F1· 2024-07-310.2881best: 0.4703 (Mistral-IT-v02-7B-32k)