Datasets

19,997 machine learning datasets

19,997 dataset results

STAR Benchmark (Situated Reasoning)

How to capture the present knowledge from surrounding situations and perform reasoning accordingly is crucial and challenging for machine intelligence. STAR Benchmark is a novel benchmark for Situated Reasoning, which provides 60K challenging situated questions in four types of tasks, 140K situated hypergraphs, symbolic situation programs, and logic-grounded diagnosis for real-world video situations. (Data Download, STAR Leaderboard)

17 papers3 benchmarksTexts, Videos

WOST

The Weakly Occluded Scene Text (WOST) dataset is a public dataset for scene text segmentation. It is used to generate pixel-level annotations in scene text images 1. The dataset is designed to contain weakly annotated images, which means that the images are not fully annotated with pixel-level labels.

17 papers3 benchmarks

QUILT-1M

Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of similar data in the medical field, specifically in histopathology, has halted similar progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering 1,087 hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of 768,826 image and text pairs. Quilt was automatically curated using a mixture of models, including large language models), handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around 200K samples. We combine Quilt with datasets, from other sources, including Twitter, research papers, and the internet in general, to create an even larger dat

17 papers0 benchmarksImages, Medical, Texts

MoCA-Mask (Moving Camouflaged Animals (MoCA)-Mask)

The original Moving Camouflaged Animals (MoCA) Dataset includes 37K frames from 141 YouTube Video sequences with resolution and sampling rate of 720 × 1280 and 24fps in the majority of cases. The dataset covers 67 types of animals moving in natural scenes, but some are not camouflaged animals. Also, the ground truth of the original dataset is bounding boxes rather than dense segmentation masks, which makes it hard to evaluate the VCOD segmentation performance. To this end, we reorganize the dataset as MoCA-Mask and build a comprehensive benchmark with more comprehensive evaluation criteria.

17 papers35 benchmarksVideos

Soccer (ISSIA-CNR Soccer)

This dataset was originally introduced by [1] for soccer ball and player tracking from six synchronized videos. Since ball annotations provided by [1] are collapsed, new annotations of ball 2D coordinates are provided by [2] For sports ball detection and tracking evaluation, the first four video clips are used for training and the remaining two clips are for testing.

17 papers3 benchmarks

NINCO (No ImageNet Class Objects)

The NINCO (No ImageNet Class Objects) dataset is introduced in the ICML 2023 paper In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation. The images in this dataset are free from objects that belong to any of the 1000 classes of ImageNet-1K (ILSVRC2012), which makes NINCO suitable for evaluating out-of-distribution detection on ImageNet-1K .

17 papers0 benchmarksImages

BABE (Bias Annotations By Experts)

BABE is an expertly annotated dataset aimed at facilitating media bias research. Comprising 3,700 sentences that are evenly distributed across various topics and outlets. Each sentence in the dataset is annotated for media bias at both the word and sentence levels. The development of BABE involved a process of data collection and annotation, focusing on sentences extracted from news articles that span a range of predefined controversial topics and were published across different U.S. media platforms between January 2017 and June 2020.

17 papers0 benchmarks

RepoBench

RepoBench is a benchmark designed for evaluating repository-level code auto-completion systems, focusing on more complex, real-world programming scenarios involving multiple files. It comprises three tasks: RepoBench-R (Retrieval), measuring the system's ability to retrieve relevant code snippets; RepoBench-C (Code Completion), assessing the prediction of the next line of code with both in-file and cross-file context; and RepoBench-P (Pipeline), evaluating complex tasks requiring both retrieval and prediction. RepoBench aims to provide a comprehensive performance comparison to foster continuous improvement in auto-completion systems.

17 papers0 benchmarks

CC152K (Conceptual Captions 152K)

CC152K is a subset of Conceptual Captions. It contains 150,000 randomly selected samples from the training split for training, 1,000 samples from the validation split for validation, and 1,000 samples from the validation split for testing.

17 papers21 benchmarks

COCO-Noisy (Microsoft Common Objects in Context with 20% of Noisy Correspondence and 1K test data)

This dataset is based on MS COCO that have 20% of data randomly shuffled to simulate noisy correspondence.

17 papers21 benchmarksImages, Texts

MM-Vet v2

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

17 papers4 benchmarksImages, Texts

Vinoground

A temporal counterfactual dataset composing of 1000 short and natural video-caption pairs.

17 papers6 benchmarksTexts, Videos

AVeriTeC (AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web)

AVeriTeC (Automated Verification of Textual Claims) is a dataset of 4568 real-world claims covering fact-checks by 50 different organizations. Each claim is annotated with question-answer pairs supported by evidence available online, as well as textual justifications explaining how the evidence combines to produce a verdict. The Claims in AVeriTeC are classified into four labels: "Supported", "Refuted", "Not Enough Evidence", "Conflicting Evidence/Cherry-picking". The dataset also contains several fields of metadata such as the speaker of the claim, the publisher of the claim, the date the claim was published, and the location most relevant to the claim. These can be used to support questions, answers, and justifications.

17 papers3 benchmarksTexts

GSM-Plus

By perturbing the widely used GSM8K dataset, an adversarial dataset for grade-school math called GSM-Plus is created. Motivated by the capability taxonomy for solving math problems mentioned in Polya's principles, this paper identifies 5 perspectives to guide the development of GSM-Plus:

17 papers4 benchmarksTexts

SMDD (Synthetic Face Morphing Attack Detection Development Dataset)

The Synthetic Morphing Attack Detection Development (SMDD) dataset is a synthetic-based MAD dataset, consisting of 25k bona-fide images generated using the StyleGAN2-ADA framework and 15k morphing attacks created from the bonafide samples using the OpenCV morphing technique.

17 papers0 benchmarks

CMU-MOSI (Multimodal Corpus of Sentiment Intensity)

The Multimodal Corpus of Sentiment Intensity (CMU-MOSI) dataset is a collection of 2199 opinion video clips. Each opinion video is annotated with sentiment in the range [-3,3]. The dataset is rigorously annotated with labels for subjectivity, sentiment intensity, per-frame and per-opinion annotated visual features, and per-milliseconds annotated audio features.

16 papers8 benchmarksAudio, Texts, Videos

ChemProt

ChemProt consists of 1,820 PubMed abstracts with chemical-protein interactions annotated by domain experts and was used in the BioCreative VI text mining chemical-protein interactions shared task.

16 papers2 benchmarksBiomedical, Texts

HVU (Holistic Video Understanding)

HVU is organized hierarchically in a semantic taxonomy that focuses on multi-label and multi-task video understanding as a comprehensive problem that encompasses the recognition of multiple semantic aspects in the dynamic scene. HVU contains approx.~572k videos in total with 9 million annotations for training, validation, and test set spanning over 3142 labels. HVU encompasses semantic aspects defined on categories of scenes, objects, actions, events, attributes, and concepts which naturally captures the real-world scenarios.

16 papers0 benchmarks

ViGGO

The ViGGO corpus is a set of 6,900 meaning representation to natural language utterance pairs in the video game domain. The meaning representations are of 9 different dialogue acts.

16 papers2 benchmarksTexts

NYU Hand

The NYU Hand pose dataset contains 8252 test-set and 72757 training-set frames of captured RGBD data with ground-truth hand-pose information. For each frame, the RGBD data from 3 Kinects is provided: a frontal view and 2 side views. The training set contains samples from a single user only (Jonathan Tompson), while the test set contains samples from two users (Murphy Stein and Jonathan Tompson). A synthetic re-creation (rendering) of the hand pose is also provided for each view.

16 papers0 benchmarksImages, Interactive

PreviousPage 116 of 1000Next