Papers With Code 2 | ML Benchmarks, SotA Results & Code

The Linguistic Benchmark (JSON), consisting of 30 questions was developed to be easy for human adults to answer but challenging for LLMs. It is designed to assess the well-documented limitations of LLMs across domains such as spatial reasoning, linguistic understanding, relational thinking, mathematical reasoning, knowledge of basic scientific concepts, and common sense. This benchmark is a useful tool to gauge the current capabilities capabilities of LLMs. The questions serve as a linguistic benchmark to examine model performance in several key domains where they have known limitations.