CompMix-IR
CompMix-IR Dataset Overview:
Characteristics: CompMix-IR is a heterogeneous knowledge retrieval benchmark dataset, featuring four knowledge types (text, knowledge graphs, tables, and infoboxes), 9,400+ QA pairs, and a corpus of 10 million entries. It supports two retrieval scenarios: retrieving across all knowledge types or retrieving specific types based on user instructions.
Motivation: It addresses the limitations of existing benchmarks by providing a more comprehensive and realistic dataset that reflects real-world retrieval needs with diverse knowledge sources and user intents.
Potential Use Cases: Ideal for developing and evaluating heterogeneous IR models, instruction-aware retrieval systems, and open-domain QA systems. It can also be used for benchmarking, cross-domain IR research, and enhancing the adaptability and robustness of retrieval models.