Datasets

19,997 machine learning datasets

19,997 dataset results

MACHIAVELLI

The MACHIAVELLI Benchmark is a tool designed to measure the behavior of artificial agents, particularly their ethical behavior in pursuit of their objectives¹².

14 papers0 benchmarks

MixEval is a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking (i.e., 0.96 correlation with Chatbot Arena) while running locally and quickly (6% the time and cost of running MMLU), with its queries being stably and effortlessly updated every month to avoid contamination.

14 papers0 benchmarks

MAVE (MAVE: : A Product Dataset for Multi-source Attribute Value Extraction)

The dataset contains 3 million attribute-value annotations across 1257 unique categories created from 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attribute extraction study.

14 papers3 benchmarksTexts

CROHME 2016

Source: ICFHR2016 CROHME: Competition on Recognition of Online Handwritten Mathematical Expressions

14 papers1 benchmarks

CROHME 2014

Benchmark for HMER and OHMER Source: CROHME 2014

14 papers1 benchmarks

Inter-X

Inter-X is a large-scale dataset containing ~11K interaction sequences, more than 8.1M frames and 34K fine-grained human textual descriptions.

14 papers16 benchmarksTexts

SIBR (SIBR Dataset for VIE in the Wild)

SIBR是面向自然场景视觉信息抽取的数据集。

14 papers1 benchmarksImages, Texts

GTOS (Ground Terrain in Outdoor Scenes)

The database consists of over 30,000 images covering 40 classes of outdoor ground terrain under varying weather and lighting conditions.

14 papers0 benchmarks

OVBench

OVBench is a benchmark tailored for real-time video understanding:

14 papers1 benchmarksTexts, Videos

Visual Madlibs

Visual Madlibs is a dataset consisting of 360,001 focused natural language descriptions for 10,738 images. This dataset is collected using automatically produced fill-in-the-blank templates designed to gather targeted descriptions about: people and objects, their appearances, activities, and interactions, as well as inferences about the general scene or its broader context.

13 papers0 benchmarksImages, Texts

Darmstadt Noise Dataset (zaid allal)

the dataset contains data about hydrogen storage in metal hydrides

13 papers10 benchmarks

VQA-CP

The VQA-CP dataset was constructed by reorganizing VQA v2 such that the correlation between the question type and correct answer differs in the training and test splits. For example, the most common answer to questions starting with What sport… is tennis in the training set, but skiing in the test set. A model that guesses an answer primarily from the question will perform poorly.

13 papers1 benchmarksImages, Texts

Mutagenicity

Mutagenicity is a chemical compound dataset of drugs, which can be categorized into two classes: mutagen and non-mutagen.

13 papers2 benchmarksGraphs

eQASC

This dataset contains 98k 2-hop explanations for questions in the QASC dataset, with annotations indicating if they are valid (~25k) or invalid (~73k) explanations.

13 papers0 benchmarksTexts

SEN12MS-CR

SEN12MS-CR is a multi-modal and mono-temporal data set for cloud removal. It contains observations covering 175 globally distributed Regions of Interest recorded in one of four seasons throughout the year of 2018. For each region, paired and co-registered synthetic aperture radar (SAR) Sentinel-1 measurements as well as cloudy and cloud-free optical multi-spectral Sentinel-2 observations from European Space Agency's Copernicus mission are provided. The Sentinel satellites provide public access data and are among the most prominent satellites in Earth observation.

13 papers8 benchmarksHyperspectral images, Images

Multilingual Reuters (Multilingual Reuters Collection)

The Multilingual Reuters Collection dataset comprises over 11,000 articles from six classes in five languages, i.e., English (E), French (F), German (G), Italian (I), and Spanish (S).

13 papers0 benchmarksTexts

Dayton

The Dayton dataset is a dataset for ground-to-aerial (or aerial-to-ground) image translation, or cross-view image synthesis. It contains images of road views and aerial views of roads. There are 76,048 images in total and the train/test split is 55,000/21,048. The images in the original dataset have 354×354 resolution.

13 papers0 benchmarksImages

Watch-n-Patch

The Watch-n-Patch dataset was created with the focus on modeling human activities, comprising multiple actions in a completely unsupervised setting. It is collected with Microsoft Kinect One sensor for a total length of about 230 minutes, divided in 458 videos. 7 subjects perform human daily activities in 8 offices and 5 kitchens with complex backgrounds. Moreover, skeleton data are provided as ground truth annotations.

13 papers0 benchmarksImages, Interactive

EgoDexter

The EgoDexter dataset provides both 2D and 3D pose annotations for 4 testing video sequences with 3190 frames. The videos are recorded with body-mounted camera from egocentric viewpoints and contain cluttered backgrounds, fast camera motion, and complex interactions with various objects. Fingertip positions were manually annotated for 1485 out of 3190 frames.

13 papers0 benchmarksImages, RGB-D, Videos

Who-did-What (Who did What)

Who-did-What collects its corpus from news and provides options for questions similar to CBT. Each question is formed from two independent articles: an article is treated as context to be read and a separate article on the same event is used to form the query.

13 papers0 benchmarksTexts

PreviousPage 131 of 1000Next