e-ViL

ImagesTextsMultiple licensesIntroduced 2021-05-08

e-ViL is a benchmark for explainable vision-language tasks. e-ViL spans across three datasets of human-written NLEs (natural language explanations), and provides a unified evaluation framework that is designed to be re-usable for future works.

This benchmark uses the following datasets: e-SNLI-VE, VCR, VQA-X.