Papers With Code 2 | ML Benchmarks, SotA Results & Code

The LogiEval dataset is a benchmark suite designed for evaluating the logical reasoning abilities of prompt-based language models, particularly instruct-prompt large language models. Here are some key details about LogiEval:

Purpose and Origin:
- LogiEval was created to assess how well language models perform in tasks that require logical reasoning.
- It is based on the OpenAI Eval library and focuses on evaluating logical reasoning abilities.
- The dataset was developed by researchers to address the need for robust logical reasoning evaluation.
Contents:
- LogiEval contains a set of logical reasoning tasks that challenge models to reason deductively.
- The tasks cover various types of logical reasoning, providing a comprehensive evaluation.
- The dataset includes 8,678 QA instances sourced from expert-written questions.
Usage:
- Researchers and practitioners can use LogiEval to assess the logical reasoning capabilities of different models.
- To utilize LogiEval, one can follow the instructions provided in the repository, including setting up the necessary environment and running evaluations.
Citation:
- If you're interested in using LogiEval or referring to it in your work, you can cite the following paper:
  - Title: "Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4"
  - Authors: Hanmeng Liu, Ruoxi Ning, Zhiyang Teng, Jian Liu, Qiji Zhou, Yue Zhang
  - Year: 2023
  - Link: Read the paper ⁴

In summary, LogiEval provides a valuable resource for assessing logical reasoning abilities in prompt-based language models. Researchers can use it to evaluate and compare different models' performance in logical reasoning tasks.

Source: Conversation with Bing, 3/18/2024 (1) Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4 - arXiv.org. https://arxiv.org/pdf/2304.03439.pdf. (2) GitHub - csitfun/LogiEval: a benchmark suite for testing logical .... https://github.com/csitfun/LogiEval. (3) [2007.08124] LogiQA: A Challenge Dataset for Machine Reading .... https://arxiv.org/abs/2007.08124. (4) [2203.15099] LogicInference: A New Dataset for Teaching Logical .... https://arxiv.org/abs/2203.15099. (5) Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4. https://arxiv.org/abs/2304.03439.