Papers With Code 2 | ML Benchmarks, SotA Results & Code

IllusionChar_test

Dataset Characteristics

IllusionChar_test is a generated dataset containing 3,300 samples of images that feature sequences of 3 to 5 random characters. Unlike classification-focused datasets, this dataset is designed for tasks that require reasoning about patterns, sequences, or illusions within the character sequences. All images are synthetically generated, and no real-world data is included.

Motivations and Content Summary

The dataset was created using ControlNet for generating images and captions from four large language models (LLMs). It aims to incorporate the phenomenon of pareidolia, encouraging models to discern illusions or patterns within character sequences. By focusing on character-based sequences rather than classification, this dataset challenges multimodal models to analyze abstract combinations of symbols and interpret any illusory aspects.

Potential Use Cases

Illusory VQA: Questioning models about potential illusions or patterns in the character sequences.
Sequence Reasoning Tasks: Evaluating a model’s ability to interpret and reason about ordered sequences.
Multimodal Model Evaluation: Benchmarking models on abstract and symbolic data with potential illusions.
Synthetic Data Research: Exploring synthetic datasets for challenging machine learning models with abstract reasoning tasks.