FinanceBench

Introduced 2023-11-20

Certainly! FinanceBench is a groundbreaking benchmark designed for evaluating the performance of large language models (LLMs) in the domain of financial question answering (QA). Here are the key details about FinanceBench:

  1. What is FinanceBench?

    • FinanceBench is a first-of-its-kind test suite specifically tailored for assessing LLMs' capabilities in answering financial questions.
    • It focuses on open book financial QA and comprises a collection of 10,231 questions related to publicly traded companies.
    • Each question comes with corresponding answers and evidence strings.
  2. Why FinanceBench Matters:

    • The questions in FinanceBench are ecologically valid, covering a diverse set of scenarios.
    • They are intentionally designed to be clear-cut and straightforward, serving as a minimum performance standard.
    • FinanceBench aims to evaluate how well LLMs handle financial queries, especially those related to publicly traded companies.
  3. Model Evaluation:

    • Researchers tested 16 state-of-the-art model configurations, including GPT-4-Turbo, Llama2, and Claude2.
    • The evaluation involved a sample of 150 cases from FinanceBench, with manual review of answers (totaling 2,400).
    • Notably, existing LLMs have limitations for financial QA. For instance:
      • GPT-4-Turbo, when used with a retrieval system, incorrectly answered or refused to answer 81% of questions.
      • Augmentation techniques (such as longer context windows) improved performance but are unrealistic for enterprise settings due to increased latency and inability to handle larger financial documents.
    • All examined models exhibit weaknesses, such as hallucinations, which limit their suitability for enterprise use.
  4. Availability:

    • The FinanceBench cases are available open-source for further exploration and research.

¹: Islam, P., Kannappan, A., Kiela, D., Qian, R., Scherrer, N., & Vidgen, B. (2023). FinanceBench: A New Benchmark for Financial Question Answering. arXiv preprint arXiv:2311.11944. ²: Link to the official paper ³: Papers with Code - FinanceBench

Source: Conversation with Bing, 3/16/2024 (1) Papers with Code - FinanceBench: A New Benchmark for Financial Question .... https://paperswithcode.com/paper/financebench-a-new-benchmark-for-financial. (2) FinanceBench: A New Benchmark for Financial Question Answering. https://arxiv.org/abs/2311.11944. (3) Papers with Code - Paper tables with annotated results for FinanceBench .... https://paperswithcode.com/paper/financebench-a-new-benchmark-for-financial/review/.