Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | PaLM 2 (few-shot, k=3, CoT) | 91.2 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 2 | PaLM 2 (few-shot, k=3, Direct) | 74 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 3 | Bloomberg GPT 50B (few-shot, k=3) | 54.8 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 4 | PaLM 540B (few-shot,k=3) | 53.6 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 5 | Chinchilla-70B (few-shot, k=5) | 52.3 | No | Training Compute-Optimal Large Language Models | 2022-03-29 | Code |
| 6 | BLOOM 176B (few-shot, k=3) | 50 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 7 | OPT 66B (few-shot, k=3) | 49.6 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 8 | GPT-NeoX 20B (few-shot, k=3) | 45.6 | No | BloombergGPT: A Large Language Model for Finance | 2023-03-30 | Code |
| 9 | Gopher-280B (few-shot, k=5) | 44.1 | No | Scaling Language Models: Methods, Analysis & Ins... | 2021-12-08 | Code |