TinyQA Benchmark++

2 benchmarks1 papers

Ultra-lightweight evaluation suite and python package designed to expose critical failures in Large Language Model (LLM) systems within seconds

Benchmarks