Datasets

19,997 machine learning datasets

19,997 dataset results

TriviaHG

Nowadays, individuals tend to engage in dialogues with Large Language Models, seeking answers to their questions. In times when such answers are readily accessible to anyone, the stimulation and preservation of human’s cognitive abilities, as well as the assurance of maintaining good reasoning skills by humans becomes crucial. This study addresses such needs by proposing hints (instead of final answers or before giving answers) as a viable solution. We introduce a framework for the automatic hint generation for factoid questions, employing it to construct TriviaHG, a novel large-scale dataset featuring 160,230 hints corresponding to 16,645 questions from the TriviaQA dataset. Additionally, we present an automatic evaluation method that measures the Convergence and Familiarity quality attributes of hints. To evaluate the TriviaHG dataset and the proposed evaluation method, we enlisted 10 individuals to annotate 2,791 hints and tasked 6 humans with answering questions using the provided

3 papers0 benchmarksTexts

VBR (VBR: A Vision Benchmark in Rome)

This dataset presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divi

3 papers0 benchmarks3D, LiDAR, Point cloud, RGB Video, Stereo, Tracking

DrugBank (DrugBank 6.0: the DrugBank Knowledgebase for 2024)

Abstract: First released in 2006, DrugBank (https://go.drugbank.com) has grown to become the 'gold standard' knowledge resource for drug, drug-target and related pharmaceutical information. DrugBank is widely used across many diverse biomedical research and clinical applications, and averages more than 30 million views/year. Since its last update in 2018, we have been actively enhancing the quantity and quality of the drug data in this knowledgebase. In this latest release (DrugBank 6.0), the number of FDA approved drugs has grown from 2646 to 4563 (a 72% increase), the number of investigational drugs has grown from 3394 to 6231 (a 38% increase), the number of drug-drug interactions increased from 365 984 to 1 413 413 (a 300% increase), and the number of drug-food interactions expanded from 1195 to 2475 (a 200% increase). In addition to this notable expansion in database size, we have added thousands of new, colorful, richly annotated pathways depicting drug mechanisms and drug metabol

3 papers3 benchmarks

PuzzleVQA

Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of puzzles based on abstract patterns. With this dataset, we evaluate large multimodal models with abstract patterns based on fundamental concepts, including colors, numbers, sizes, and shapes. Through our experiments on state-of-the-art large multimodal models, we find that they are not able to generalize well to simple abstract patterns. Notably, even GPT-4V cannot solve more than half of the puzzles. To diagnose the reasoning challenges in large multimodal models, we progressively guide the models with our ground truth reasoning explanations for visual perception, inductive reasoning, and deductive reasoning. Our systematic analysi

3 papers0 benchmarksImages, Texts

Dermatology ddx dataset

The dermatology differential diagnoses (ddx) dataset for skin condition classification includes expert annotations and model predictions for 1947 cases. Note that no images or meta information are provided. The expert annotations come in the form of differential diagnoses, i.e., partial rankings of conditions, and there is a high level of disagreement among experts, making this a perfect benchmark for dealing with disagreement. The data has been introduced in [1] and [2].

3 papers0 benchmarksTexts

IIW-400 (ImageInWords: IIW-400)

Please refer: https://github.com/google/imageinwords/blob/main/datasets/IIW-400/README.md

3 papers0 benchmarksImages, Texts

SG-FRONT

scene graph labels for the 3D-FRONT dataset.

3 papers0 benchmarks

Large-Scale Chemical Language Representations Capture Molecular Structure and Properties

Got "pubchem_smiles_canonical.zip" from https://ibm.ent.box.com/v/MoLFormer-data

3 papers0 benchmarks

ChatEarthNet

https://zenodo.org/records/11003436

3 papers0 benchmarks

CPsyCounD

The high-quality multi-turn dialogue dataset, which has a total of 3,134 multi-turn consultation dialogues. CPsyCounD covers nine representative topics and seven classic schools of psychological counseling.

3 papers0 benchmarksTexts

Deepmatcher

https://github.com/anhaidgroup/deepmatcher/blob/master/Datasets.md

3 papers0 benchmarks

Mono3DRefer

We sample 2025 frames of images from the original KITTI for Mono3DRefer, containing 41,140 expressions in total and a vocabulary of 5,271 words.

3 papers0 benchmarks3D, Images, Texts

ARCH2S (Dataset, Benchmark for Learning Exterior Architectural Structures from Point Clouds)

Precise segmentation of architectural structures provides detailed information about various building components, enhancing our understanding and interaction with our built environment. Nevertheless, existing outdoor 3D point cloud datasets have limited and detailed annotations on architectural exteriors due to privacy concerns and the expensive costs of data acquisition and annotation. To overcome this shortfall, this paper introduces a semantically-enriched, photo-realistic 3D architectural models dataset and benchmark for semantic segmentation. It features 4 different building purposes of real-world buildings as well as an open architectural landscape in Hong Kong. Each point cloud is annotated into one of 14 semantic classes.

3 papers2 benchmarks3D, Environment, Point cloud

MULTI

MULTI-Benchmark is a cutting-edge benchmark for evaluating Multimodal Large Language Models (MLLMs). It is designed to test the understanding of complex tables and images, and reasoning with long context¹. Here are some key features of MULTI-Benchmark:

3 papers0 benchmarksImages, Texts

MUTE (Multimodal Bengali Hateful Memes Dataset)

MUTE This is the first open-source Bengali Hateful Meme dataset, consisting of around 4200 memes annotated with two labels: hate and not hate.

3 papers0 benchmarksImages, Texts

AnoVox

AnoVox is a large-scale benchmark for ANOmaly detection in autonomous driving. AnoVox incorporates multimodal sensor data and spatial VOXel ground truth, allowing for the comparison of methods independent of their used sensor. AnoVox contains both content and temporal anomalies.

3 papers0 benchmarks3D, Images, LiDAR, RGB-D

Ego4D-HCap

Ego4D-HCap is a hierarchical video captioning dataset comprised of a three-tier hierarchy of captions: short clip-level captions, medium-length video segment descriptions, and long-range video-level summaries. To construct Ego4D-HCap, we leverage Ego4D, the largest publicly available egocentric video dataset. While Ego4D comes with time-stamped atomic captions and video-segment descriptions spanning up to 5 minutes, it lacks video-level summaries for longer video durations. To address this issue, we annotate a subset of 8,267 Ego4D videos with long-range video summaries, each spanning up to two hours. This enhancement provides a three-level hierarchy of captions.

3 papers0 benchmarksTexts, Videos

PreviousPage 291 of 1000Next

Datasets

TriviaHG

VBR (VBR: A Vision Benchmark in Rome)

DrugBank (DrugBank 6.0: the DrugBank Knowledgebase for 2024)

PuzzleVQA

Dermatology ddx dataset

IIW-400 (ImageInWords: IIW-400)

SG-FRONT

Large-Scale Chemical Language Representations Capture Molecular Structure and Properties

ChatEarthNet

CPsyCounD

Deepmatcher

Mono3DRefer

ARCH2S (Dataset, Benchmark for Learning Exterior Architectural Structures from Point Clouds)

MULTI

MUTE (Multimodal Bengali Hateful Memes Dataset)

AnoVox

Ego4D-HCap

SCapRepo (Google Play Screenshot Caption)

MOMA-LRG (Multi-Object Multi-Actor activity parsing with Language-Refined Graphs)

OpenRooms FF (OpenRooms Forward Facing)

Datasets

TriviaHG

VBR (VBR: A Vision Benchmark in Rome)

DrugBank (DrugBank 6.0: the DrugBank Knowledgebase for 2024)

PuzzleVQA

Dermatology ddx dataset

IIW-400 (ImageInWords: IIW-400)

SG-FRONT

Large-Scale Chemical Language Representations Capture Molecular Structure and Properties

ChatEarthNet

CPsyCounD

Deepmatcher

Mono3DRefer

ARCH2S (Dataset, Benchmark for Learning Exterior Architectural Structures from Point Clouds)

MULTI

MUTE (Multimodal Bengali Hateful Memes Dataset)

AnoVox

Ego4D-HCap

SCapRepo (Google Play Screenshot Caption)

MOMA-LRG (Multi-Object Multi-Actor activity parsing with Language-Refined Graphs)

OpenRooms FF (OpenRooms Forward Facing)