Datasets

19,997 machine learning datasets

19,997 dataset results

HoVer

Is a dataset for many-hop evidence extraction and fact verification. It challenges models to extract facts from several Wikipedia articles that are relevant to a claim and classify whether the claim is Supported or Not-Supported by the facts. In HoVer, the claims require evidence to be extracted from as many as four English Wikipedia articles and embody reasoning graphs of diverse shapes.

35 papers0 benchmarks

MotionSense

This dataset includes time-series data generated by accelerometer and gyroscope sensors (attitude, gravity, userAcceleration, and rotationRate). It is collected with an iPhone 6s kept in the participant's front pocket using SensingKit which collects information from Core Motion framework on iOS devices. All data is collected in 50Hz sample rate. A total of 24 participants in a range of gender, age, weight, and height performed 6 activities in 15 trials in the same environment and conditions: downstairs, upstairs, walking, jogging, sitting, and standing.

35 papers0 benchmarksTime series

MuCo-3DHP

MuCo-3DHP is a large scale training data set showing real images of sophisticated multi-person interactions and occlusions.

35 papers0 benchmarksImages

PHYRE (PHYsical REasoning)

Benchmark for physical reasoning that contains a set of simple classical mechanics puzzles in a 2D physical environment. The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles.

35 papers0 benchmarksEnvironment

PST900

PST900 is a dataset of 894 synchronized and calibrated RGB and Thermal image pairs with per pixel human annotations across four distinct classes from the DARPA Subterranean Challenge.

35 papers4 benchmarksImages

SARC

This dataset was designed for contextual investigations, with related works making considerable usage of said context. The dataset was constructed by scraping Reddit comments; with sarcastic entries being self-annotated by authors through the use of the \s token, which indicates sarcastic intent on the website. Posts on Reddit are often in response to another comment; SARC incorporates this information through the addition of the parent comment and further child comments surrounding a post.

35 papers0 benchmarks

Spot-the-diff

Spot-the-diff is a dataset consisting of 13,192 image pairs along with corresponding human provided text annotations stating the differences between the two images.

35 papers0 benchmarksImages, Texts

SVT (Street View Text Dataset)

The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variability and often has low resolution. In dealing with outdoor street level imagery, we note two characteristics. (1) Image text often comes from business signage and (2) business names are easily available through geographic business searches. These factors make the SVT set uniquely suited for word spotting in the wild: given a street view image, the goal is to identify words from nearby businesses.

35 papers3 benchmarksImages

DeepCAD

DeepCAD is a CAD dataset consisting of 179,133 models and their CAD construction sequences. It can be used to train generative models of 3D shapes.

35 papers9 benchmarks3D

EgoBody

EgoBody dataset is a novel large-scale dataset for egocentric 3D human pose, shape and motions under interactions in complex 3D scenes.

35 papers20 benchmarks3D

Amazon Review

Amazon Review is a dataset to tackle the task of identifying whether the sentiment of a product review is positive or negative. This dataset includes reviews from four different merchandise categories: Books (B) (2834 samples), DVDs (D) (1199 samples), Electronics (E) (1883 samples), and Kitchen and housewares (K) (1755 samples).

35 papers0 benchmarks

CaRB (Crowdsourced automatic open Relation extraction Benchmark)

CaRB [Bhardwaj et al., 2019] is developed by re-annotating the dev and test splits of OIE2016 via crowd-sourcing. Besides improving annotation quality, CaRB also provides a new matching scorer. CaRB scorer uses token level match and it matches relation with relation, arguments with arguments.

35 papers1 benchmarks

V2V4Real

V2V4Real is a large-scale real-world multi-modal dataset for V2V perception. The data is collected by two vehicles equipped with multi modal sensors driving together through diverse scenarios. It covers a driving area of 410 km comprising 20K LiDAR frames, 40K RGB frames, 240K annotated 3D bounding boxes for 5 classes, and HDMaps that cover all the driving routes.

35 papers0 benchmarksLiDAR, RGB Video, Videos

CIRCO (Composed Image Retrieval on Common Objects in context)

CIRCO (Composed Image Retrieval on Common Objects in context) is an open-domain benchmarking dataset for Composed Image Retrieval (CIR) based on real-world images from COCO 2017 unlabeled set. It is the first CIR dataset with multiple ground truths and aims to address the problem of false negatives in existing datasets. CIRCO comprises a total of 1020 queries, randomly divided into 220 and 800 for the validation and test set, respectively, with an average of 4.53 ground truths per query.

35 papers8 benchmarksImages, Texts

VSPW (Video Scene Parsing in the Wild)

A Large-scale Dataset for Video Scene Parsing in the Wild

35 papers4 benchmarks

ARO

Attribution, Relation, and Order (ARO) benchmark to systematically evaluate the ability of VLMs to understand different types of relationships, attributes, and order information. ARO consists of Visual Genome Attribution, to test the understanding of objects' properties; Visual Genome Relation, to test for relational understanding; and COCO-Order & Flickr30k-Order, to test for order sensitivity in VLMs. ARO is orders of magnitude larger than previous benchmarks of compositionality, with more than 50,000 test cases.

35 papers0 benchmarksImages, Texts

ImageNet-64

Imagenet64 is a massive dataset of small images called the down-sampled version of Imagenet. Imagenet64 comprises 1,281,167 training data and 50,000 test data with 1,000 labels.

35 papers1 benchmarks

CMNLI (Chinese Multi-Genre NLI)

The CMNLI dataset is part of the Chinese Language Understanding Evaluation (CLUE) benchmark. It consists of two parts: XNLI and MNLI. The data comes from various sources such as fiction, telephone, travel, government, slate, etc. The original MNLI data and XNLI data were translated into Chinese and English. The original training set was retained, and the dev and test sets were created by merging and shuffling the dev set from XNLI and the matched set from MNLI, and the test set from XNLI and the mismatched set from MNLI, respectively.

35 papers0 benchmarks

MARC (Multilingual Amazon Reviews Corpus)

Multilingual Amazon Reviews Corpus (MARC) is a large-scale collection of Amazon reviews for multilingual text classification. The corpus contains reviews in English, Japanese, German, French, Spanish, and Chinese, which were collected between 2015 and 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID, and the coarse-grained product category (e.g., 'books', 'appliances', etc.) The corpus is balanced across the 5 possible star ratings, so each rating constitutes 20% of the reviews in each language. For each language, there are 200,000, 5,000, and 5,000 reviews in the training, development, and test sets, respectively.

35 papers0 benchmarks

FinTabNet

This dataset contains complex tables from the annual reports of S&P 500 companies with detailed table structure annotations to help table structure recognition and table data extraction. The dataset consists of 89,646 pages comprising 112,887 tables with cell structure annotated from IBM Research.

35 papers0 benchmarksFinancial, Images, Tabular

PreviousPage 72 of 1000Next