DailyDialog++

Consists of (i) five relevant responses for each context and (ii) five adversarially crafted irrelevant responses for each context.

Source: Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining