KIEval
KIEval provides a robust framework for dynamic, interactive evaluation of large language models, reducing the impact of data contamination and offering deeper insights into a model's true capabilities. It shifts the focus from static evaluation to a more comprehensive assessment of knowledge understanding and application. How KIEval Works:
Dynamic Interactions: KIEval introduces an "interactor" model that engages in multi-round dialogues with the evaluated model. Each round generates new, deeper questions based on previous responses, testing the model's knowledge application and coherence.
Evaluation Process: An initial question from a high-quality dataset starts the dialogue. The "interactor" then generates follow-up questions to probe deeper. An "evaluator" model assesses responses for relevance, coherence, and logic.
Advantages: This method reduces the impact of data contamination and comprehensively evaluates the model's abilities beyond simple pattern matching.