19,997 machine learning datasets
19,997 dataset results
Nowadays, individuals tend to engage in dialogues with Large Language Models, seeking answers to their questions. In times when such answers are readily accessible to anyone, the stimulation and preservation of human’s cognitive abilities, as well as the assurance of maintaining good reasoning skills by humans becomes crucial. This study addresses such needs by proposing hints (instead of final answers or before giving answers) as a viable solution. We introduce a framework for the automatic hint generation for factoid questions, employing it to construct TriviaHG, a novel large-scale dataset featuring 160,230 hints corresponding to 16,645 questions from the TriviaQA dataset. Additionally, we present an automatic evaluation method that measures the Convergence and Familiarity quality attributes of hints. To evaluate the TriviaHG dataset and the proposed evaluation method, we enlisted 10 individuals to annotate 2,791 hints and tasked 6 humans with answering questions using the provided
This dataset presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divi
Abstract: First released in 2006, DrugBank (https://go.drugbank.com) has grown to become the 'gold standard' knowledge resource for drug, drug-target and related pharmaceutical information. DrugBank is widely used across many diverse biomedical research and clinical applications, and averages more than 30 million views/year. Since its last update in 2018, we have been actively enhancing the quantity and quality of the drug data in this knowledgebase. In this latest release (DrugBank 6.0), the number of FDA approved drugs has grown from 2646 to 4563 (a 72% increase), the number of investigational drugs has grown from 3394 to 6231 (a 38% increase), the number of drug-drug interactions increased from 365 984 to 1 413 413 (a 300% increase), and the number of drug-food interactions expanded from 1195 to 2475 (a 200% increase). In addition to this notable expansion in database size, we have added thousands of new, colorful, richly annotated pathways depicting drug mechanisms and drug metabol
Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of puzzles based on abstract patterns. With this dataset, we evaluate large multimodal models with abstract patterns based on fundamental concepts, including colors, numbers, sizes, and shapes. Through our experiments on state-of-the-art large multimodal models, we find that they are not able to generalize well to simple abstract patterns. Notably, even GPT-4V cannot solve more than half of the puzzles. To diagnose the reasoning challenges in large multimodal models, we progressively guide the models with our ground truth reasoning explanations for visual perception, inductive reasoning, and deductive reasoning. Our systematic analysi
The dermatology differential diagnoses (ddx) dataset for skin condition classification includes expert annotations and model predictions for 1947 cases. Note that no images or meta information are provided. The expert annotations come in the form of differential diagnoses, i.e., partial rankings of conditions, and there is a high level of disagreement among experts, making this a perfect benchmark for dealing with disagreement. The data has been introduced in [1] and [2].
Please refer: https://github.com/google/imageinwords/blob/main/datasets/IIW-400/README.md
scene graph labels for the 3D-FRONT dataset.
Got "pubchem_smiles_canonical.zip" from https://ibm.ent.box.com/v/MoLFormer-data
https://zenodo.org/records/11003436
The high-quality multi-turn dialogue dataset, which has a total of 3,134 multi-turn consultation dialogues. CPsyCounD covers nine representative topics and seven classic schools of psychological counseling.
https://github.com/anhaidgroup/deepmatcher/blob/master/Datasets.md
We sample 2025 frames of images from the original KITTI for Mono3DRefer, containing 41,140 expressions in total and a vocabulary of 5,271 words.
Precise segmentation of architectural structures provides detailed information about various building components, enhancing our understanding and interaction with our built environment. Nevertheless, existing outdoor 3D point cloud datasets have limited and detailed annotations on architectural exteriors due to privacy concerns and the expensive costs of data acquisition and annotation. To overcome this shortfall, this paper introduces a semantically-enriched, photo-realistic 3D architectural models dataset and benchmark for semantic segmentation. It features 4 different building purposes of real-world buildings as well as an open architectural landscape in Hong Kong. Each point cloud is annotated into one of 14 semantic classes.
MULTI-Benchmark is a cutting-edge benchmark for evaluating Multimodal Large Language Models (MLLMs). It is designed to test the understanding of complex tables and images, and reasoning with long context¹. Here are some key features of MULTI-Benchmark:
MUTE This is the first open-source Bengali Hateful Meme dataset, consisting of around 4200 memes annotated with two labels: hate and not hate.
AnoVox is a large-scale benchmark for ANOmaly detection in autonomous driving. AnoVox incorporates multimodal sensor data and spatial VOXel ground truth, allowing for the comparison of methods independent of their used sensor. AnoVox contains both content and temporal anomalies.
Ego4D-HCap is a hierarchical video captioning dataset comprised of a three-tier hierarchy of captions: short clip-level captions, medium-length video segment descriptions, and long-range video-level summaries. To construct Ego4D-HCap, we leverage Ego4D, the largest publicly available egocentric video dataset. While Ego4D comes with time-stamped atomic captions and video-segment descriptions spanning up to 5 minutes, it lacks video-level summaries for longer video durations. To address this issue, we annotate a subset of 8,267 Ego4D videos with long-range video summaries, each spanning up to two hours. This enhancement provides a three-level hierarchy of captions.
A screenshot-caption dataset containing 135k pairs of screenshots and captions extracted from Google Play.
A dataset dedicated to multi-object, multi-actor activity parsing.
OpenRooms FF(Forward Facing) is a dataset that extends OpenRooms into a multi-view setup. Each image set consists of 9 images looking in the same direction. For a detailed description of dataset creation, please refer to the paper's supplementary.