CoDa

The Color Dataset

TextsApache License 2.0Introduced 2021-10-15

The Color Dataset (CoDa) is a probing dataset to evaluate the representation of visual properties in language models. CoDa consists of color distributions for 521 common objects, which are split into 3 groups: Single, Multi, and Any.

The default configuration of CoDa uses 10 CLIP-style templates (e.g. "A photo of a [object]"), and 10 cloze-style templates (e.g. "Everyone knows most [object] are [color]."

Related Benchmarks

CODAH/Common Sense Reasoning/Accuracy CODAH/Question Answering/Accuracy