FinanceBench
Certainly! FinanceBench is a groundbreaking benchmark designed for evaluating the performance of large language models (LLMs) in the domain of financial question answering (QA). Here are the key details about FinanceBench:
-
What is FinanceBench?
- FinanceBench is a first-of-its-kind test suite specifically tailored for assessing LLMs' capabilities in answering financial questions.
- It focuses on open book financial QA and comprises a collection of 10,231 questions related to publicly traded companies.
- Each question comes with corresponding answers and evidence strings.
-
Why FinanceBench Matters:
- The questions in FinanceBench are ecologically valid, covering a diverse set of scenarios.
- They are intentionally designed to be clear-cut and straightforward, serving as a minimum performance standard.
- FinanceBench aims to evaluate how well LLMs handle financial queries, especially those related to publicly traded companies.
-
Model Evaluation:
- Researchers tested 16 state-of-the-art model configurations, including GPT-4-Turbo, Llama2, and Claude2.
- The evaluation involved a sample of 150 cases from FinanceBench, with manual review of answers (totaling 2,400).
- Notably, existing LLMs have limitations for financial QA. For instance:
- GPT-4-Turbo, when used with a retrieval system, incorrectly answered or refused to answer 81% of questions.
- Augmentation techniques (such as longer context windows) improved performance but are unrealistic for enterprise settings due to increased latency and inability to handle larger financial documents.
- All examined models exhibit weaknesses, such as hallucinations, which limit their suitability for enterprise use.
-
Availability:
- The FinanceBench cases are available open-source for further exploration and research.
¹: Islam, P., Kannappan, A., Kiela, D., Qian, R., Scherrer, N., & Vidgen, B. (2023). FinanceBench: A New Benchmark for Financial Question Answering. arXiv preprint arXiv:2311.11944. ²: Link to the official paper ³: Papers with Code - FinanceBench
Source: Conversation with Bing, 3/16/2024 (1) Papers with Code - FinanceBench: A New Benchmark for Financial Question .... https://paperswithcode.com/paper/financebench-a-new-benchmark-for-financial. (2) FinanceBench: A New Benchmark for Financial Question Answering. https://arxiv.org/abs/2311.11944. (3) Papers with Code - Paper tables with annotated results for FinanceBench .... https://paperswithcode.com/paper/financebench-a-new-benchmark-for-financial/review/.