DailyDialog++
Consists of (i) five relevant responses for each context and (ii) five adversarially crafted irrelevant responses for each context.
Source: Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining