Datasets

19,997 machine learning datasets

19,997 dataset results

xCodeEval

xCodeEval is one of the largest executable multilingual multitask benchmarks consisting of 17 programming languages with execution-level parallelism. It features a total of seven tasks involving code understanding, generation, translation, and retrieval, and it employs an execution-based evaluation instead of traditional lexical approaches. It also provides a test-case-based multilingual code execution engine, ExecEval that supports all the programming languages in xCodeEval.

15 papers0 benchmarksTexts

JEEBench

JEEBench is a considerably more challenging benchmark dataset for evaluating the problem solving abilities of LLMs. It curates 515 challenging pre-engineering mathematics, physics and chemistry problems from the IIT JEE-Advanced Exam. Long-horizon reasoning on top of deep in-domain knowledge is essential for solving problems in this benchmark.

15 papers0 benchmarksTexts

HumanEval-ET

Extension test cases of HumanEval, as well as generated code.

15 papers1 benchmarks

MBPP-ET

Extension test cases of MBPP, as well as generated code.

15 papers1 benchmarks

100STLYE-Labelled

Over 4 million frames of motion capture data for 100 different styles of locomotion. Can be used for animation, human motion and sequence modelling research.

15 papers1 benchmarks

Segmentation in the Wild

Recent advances in language-image pre-training has witnessed the emerging field of building transferable systems that can effortlessly adapt to a wide range of computer vision & multimodal tasks in the wild. This also poses a challenge to evaluate the transferability of these models due to the lack of easy-to-use evaluation toolkits and public benchmarks. "Segmentation in the Wild (SegInW)" Challenge is a part of X-Decoder, that proposed a new benchmark to evaluate the transfer ability of pre-trained vision models. This benchmark presents a diverse set of downstream segmentation datasets, measuring the ability of pre-training models on both the segmentation accuracy and their transfer efficiency in a new task, in terms of training examples and trainable parameters. This SegInW Challenge consists of 25 free public Segmentation datasets, crowd-sourced on roboflow.com. For more details about the challenge submission format, please refer to X-Decoder for SGinW.

15 papers1 benchmarks

OCTID (Optical Coherence Tomography Image Retinal Database)

An open-source Optical Coherence Tomography Image Database containing different retinal OCT images with various pathological conditions. This comprehensive open-access database contains over 500 high-resolution images categorized into different pathological conditions. The image classes include Normal (NO), Macular Hole (MH), Age-related Macular Degeneration (AMD), Central Serous Retinopathy (CSR), and Diabetic Retinopathy (DR).

15 papers0 benchmarksImages

GoodsAD

The GoodsAD dataset contains 6124 images with 6 categories of common supermarket goods. Each category contains multiple goods. All images are acquired with 3000 × 3000 high-resolution. The object locations in the images are not aligned. Most objects are in the center of the images and one image only contains a single object. Most anomalies occupy only a small fraction of image pixels. Both image-level and pixel-level annotations are provided.

15 papers6 benchmarksImages

PointOdyssey

PointOdyssey is a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. The dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work.

15 papers6 benchmarksVideos

ACSPublicCoverage

ACSPublicCoverage: predict whether an individual is covered by public health insurance, after filtering the ACS PUMS data sample to only include individuals under the age of 65, and those with an income of less than $30,000. This filtering focuses the prediction problem on low-income individuals who are not eligible for Medicare.

15 papers0 benchmarks

Condensed Movies

A large-scale video dataset, featuring clips from movies with detailed captions.

15 papers6 benchmarksTexts, Videos

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. BIRD contains over 12,751 unique question-SQL pairs and 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.

15 papers6 benchmarks

Electricity Consuming Load (UCI Electricity Consuming Load)

This data set contains electricity consumption of 370 points/clients.

15 papers0 benchmarks

UBody

UBody is a large-scale Upper-Body dataset with the following annotations:

15 papers24 benchmarksImages

100poisonMpts

The 100PoisonMpts dataset is a significant initiative in the realm of large language model governance. Developed collaboratively by Alibaba Tmall Genie and the Tongyi Large Model Team, this open-source Chinese dataset aims to address safety concerns associated with large language models, especially after the release of ChatGPT. The project's purpose is to ensure that information disseminated by these models aligns with safety, reliability, and human values.

15 papers0 benchmarks

CustomHumans

CustomHumans is recorded by a multi-view photogrammetry system equipped with 53 RGB (12 Megapixels) and 53 (4 Megapixels) IR cameras. The resulting high-quality scan is composed of a 40K-face mesh alongside a 4K-resolution texture map. In addition to the high-quality scans, CustomHumans provides accurately registered SMPL-X parameters using a customized mesh registration pipeline. 80 participants are invited for the data capturing. Each of them is instructed to perform several movements, such as "T-pose", "Hands Up'", "Squat'", "Turing head'', and "Hand gestures", in a 10-second long sequence (300 frames). 4-5 best-quality meshes in each sequence are selected as the data samples. In total, the dataset contains more than 600 high-quality scans with 120 different garments.

15 papers4 benchmarks3D

CROHME 2023

Source: CROHME 2023

15 papers0 benchmarks

ISTD+

ISTD+ consists of shadow images, shadow-free images, and shadow masks, with 1,330 training images and 540 testing images from 135 unique background scenes. ISTD suffers from color and luminosity inconsistencies between shadow and shadow-free images, which ISTD+ corrects with a color compensation mechanism to ensure uniform pixel colors across the ground-truth images.

15 papers12 benchmarksImages

CCVID (Clothes-Changing Video person re-ID)

Clothes-Changing Video person re-ID (CCVID) is a dataset constructed from the raw data of a gait recognition dataset, i.e. FVG. The reconstructed CCVID dataset contains 347,833 bounding boxes. The length of each sequence changes from 27 to 410 frames, with an average length of 122. Besides, it also provides fine-grained clothes labels including tops, bottoms, shoes, carrying status, and accessories. For the convenience of evaluation, CCVID re-divides the training and test sets to adapt to clothes-changing re-id. Specifically, 75 identities are reserved for training, and the remaining 151 identities are used for test. In the test set, 834 sequences are used as query set, and the other 1074 sequences form gallery set.

15 papers3 benchmarksRGB Video

MHP (Multiple-Human Parsing)

The MHP dataset contains multiple persons captured in real-world scenes with pixel-level fine-grained semantic annotations in an instance-aware setting.

14 papers0 benchmarksImages

PreviousPage 125 of 1000Next