CoDa
The Color Dataset
TextsApache License 2.0Introduced 2021-10-15
The Color Dataset (CoDa) is a probing dataset to evaluate the representation of visual properties in language models. CoDa consists of color distributions for 521 common objects, which are split into 3 groups: Single, Multi, and Any.
The default configuration of CoDa uses 10 CLIP-style templates (e.g. "A photo of a [object]"), and 10 cloze-style templates (e.g. "Everyone knows most [object] are [color]."