CompMix-IR

GraphsTabularTextscc-by-4.0Introduced 2024-10-26

CompMix-IR Dataset Overview:

Characteristics: CompMix-IR is a heterogeneous knowledge retrieval benchmark dataset, featuring four knowledge types (text, knowledge graphs, tables, and infoboxes), 9,400+ QA pairs, and a corpus of 10 million entries. It supports two retrieval scenarios: retrieving across all knowledge types or retrieving specific types based on user instructions.

Motivation: It addresses the limitations of existing benchmarks by providing a more comprehensive and realistic dataset that reflects real-world retrieval needs with diverse knowledge sources and user intents.

Potential Use Cases: Ideal for developing and evaluating heterogeneous IR models, instruction-aware retrieval systems, and open-domain QA systems. It can also be used for benchmarking, cross-domain IR research, and enhancing the adaptability and robustness of retrieval models.