BSBench: will your LLM find the largest prime number?

K. O. T. Erziev

2025-06-05Benchmarking

Abstract

We propose that benchmarking LLMs on questions which have no reasonable answer actually isn't as silly as it sounds. We also present a benchmark that allows such testing and a method to modify the existing datasets, and discover that existing models demonstrate a performance far from the perfect on such questions. Our code and data artifacts are available at https://github.com/L3G5/impossible-bench

Related Papers

Visual Place Recognition for Large-Scale UAV Applications2025-07-20 Training Transformers with Enforced Lipschitz Constants2025-07-17 Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17 MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17 DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16 DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15 A Multi-View High-Resolution Foot-Ankle Complex Point Cloud Dataset During Gait for Occlusion-Robust 3D Completion2025-07-15 FLsim: A Modular and Library-Agnostic Simulation Framework for Federated Learning2025-07-15