TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Vicuna-13b-v1.5-16k

Vicuna-13b-v1.5-16k

Reported on 11 benchmarks across 1 task · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing11 results

  • Long-Context UnderstandingonAda-LEval (BestAnswer)
    12k· 2023-06-09
    1.4
    best: 52 (GPT-4-Turbo-0125)
    Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaarXiv:2306.05685
  • Long-Context UnderstandingonAda-LEval (BestAnswer)
    16k· 2023-06-09
    0.9
    best: 44.5 (GPT-4-Turbo-0125)
    Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaarXiv:2306.05685
  • Long-Context UnderstandingonAda-LEval (BestAnswer)
    1k· 2023-06-09
    53.4
    best: 74 (GPT-4-Turbo-1106)
    Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaarXiv:2306.05685
  • Long-Context UnderstandingonAda-LEval (BestAnswer)
    2k· 2023-06-09
    29.2
    best: 73.5 (GPT-4-Turbo-1106)
    Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaarXiv:2306.05685
  • Long-Context UnderstandingonAda-LEval (BestAnswer)
    4k· 2023-06-09
    13.1
    best: 67.5 (GPT-4-Turbo-1106)
    Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaarXiv:2306.05685
  • Long-Context UnderstandingonAda-LEval (BestAnswer)
    6k· 2023-06-09
    4.3
    best: 63 (GPT-4-Turbo-0125)
    Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaarXiv:2306.05685
  • Long-Context UnderstandingonAda-LEval (BestAnswer)
    8k· 2023-06-09
    2.2
    best: 56.5 (GPT-4-Turbo-0125)
    Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaarXiv:2306.05685
  • Long-Context UnderstandingonAda-LEval (TSort)
    16k· 2023-06-09
    3.1
    best: 5.5 (GPT-4-Turbo-0125)
    Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaarXiv:2306.05685
  • Long-Context UnderstandingonAda-LEval (TSort)
    2k· 2023-06-09
    5.4
    best: 18.5 (GPT-4-Turbo-1106)
    Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaarXiv:2306.05685
  • Long-Context UnderstandingonAda-LEval (TSort)
    4k· 2023-06-09
    5
    best: 16.5 (GPT-4-Turbo-0125)
    Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaarXiv:2306.05685
  • Long-Context UnderstandingonAda-LEval (TSort)
    8k· 2023-06-09
    2.4
    best: 8.5 (GPT-4-Turbo-0125)
    Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaarXiv:2306.05685