Papers With Code 2 | ML Benchmarks, SotA Results & Code

KIEval provides a robust framework for dynamic, interactive evaluation of large language models, reducing the impact of data contamination and offering deeper insights into a model's true capabilities. It shifts the focus from static evaluation to a more comprehensive assessment of knowledge understanding and application. How KIEval Works:

Dynamic Interactions: KIEval introduces an "interactor" model that engages in multi-round dialogues with the evaluated model. Each round generates new, deeper questions based on previous responses, testing the model's knowledge application and coherence.

Evaluation Process: An initial question from a high-quality dataset starts the dialogue. The "interactor" then generates follow-up questions to probe deeper. An "evaluator" model assesses responses for relevance, coherence, and logic.

Advantages: This method reduces the impact of data contamination and comprehensively evaluates the model's abilities beyond simple pattern matching.