Datasets

19,997 machine learning datasets

19,997 dataset results

DIP-IMU

Dataset consisting of IMU measurements and corresponding SMPL poses. Participants were wearing 17 IMU sensors and reference SMPL poses were obtained by running the SIP optimization with all 17 sensors.

15 papers0 benchmarks

3D AffordanceNet

3D AffordanceNet is a dataset of 23k shapes for visual affordance. It consists of 56,307 well-defined affordance information annotations for 22,949 shapes covering 18 affordance classes and 23 semantic object categories.

15 papers2 benchmarks3D, 3d meshes

KolektorSDD2 (Kolektor Surface-Defect Dataset 2)

KolektorSDD2 is a surface-defect detection dataset with over 3000 images containing several types of defects, obtained while addressing a real-world industrial problem.

15 papers9 benchmarksImages

ANIMAL (ANIMAL-10N)

10 classes with 50, 000 training and 5, 000 testing images. Please note that, in ANIMAL10N, noisy labels were injected naturally by human mistakes, where its noise rate was estimated at 8%.

15 papers6 benchmarks

BugSwarm

BugSwarm is a dataset of reproducible faults and fixes to perform experimental evaluation of approaches to software quality. The BugSwarm toolkit has already gathered 3,091 fail-pass pairs, in Java and Python, all packaged within fully reproducible containers.

15 papers0 benchmarks

RoadAnomaly21

RoadAnomaly21 is a dataset for anomaly segmentation, the task of identify the image regions containing objects that have never been seen during training. It consists of an evaluation dataset of 100 images with pixel-level annotations. Each image contains at least one anomalous object, e.g. animals or unknown vehicles. The anomalies can appear anywhere in the image and widely differ in size, covering from 0.5% to 40% of the image

15 papers0 benchmarksImages

MIAP (More Inclusive Annotations for People)

MIAP is a dataset created by obtaining a new set of annotations on a subset of the Open Images dataset, containing bounding boxes and attributes for all of the people visible in those images, as the original Open Images dataset annotations are not exhaustive, with bounding boxes and attribute labels for only a subset of the classes in each image.

15 papers0 benchmarksImages

UFPR-ALPR

This dataset includes 4,500 fully annotated images (over 30,000 license plate characters) from 150 vehicles in real-world scenarios where both the vehicle and the camera (inside another vehicle) are moving.

15 papers1 benchmarksImages

Project CodeNet

Project CodeNet is a large-scale dataset with approximately 14 million code samples, each of which is an intended solution to one of 4000 coding problems. The code samples are written in over 50 programming languages (although the dominant languages are C++, C, Python, and Java) and they are annotated with a rich set of information, such as its code size, memory footprint, cpu run time, and status, which indicates acceptance or error types. The dataset is accompanied by a repository, where we provide a set of tools to aggregate codes samples based on user criteria and to transform code samples into token sequences, simplified parse trees and other code graphs. A detailed discussion of Project CodeNet is available in this paper.

15 papers0 benchmarksTexts

Everybody Dance Now

Everybody Dance Now is a dataset of videos that can be used for training and motion transfer. It contains long single-dancer videos that can be used to train and evaluate the model. All subjects have consented to allowing the data to be used for research purposes.

15 papers0 benchmarksVideos

FaVIQ (Fact Verification from Information-seeking Questions)

FaVIQ (Fact Verification from Information-seeking Questions) is a challenging and realistic fact verification dataset that reflects confusions raised by real users. We use the ambiguity in information-seeking questions and their disambiguation, and automatically convert them to true and false claims. These claims are natural, and require a complete understanding of the evidence for verification. FaVIQ serves as a challenging benchmark for natural language understanding, and improves performance in professional fact checking.

15 papers0 benchmarksTexts

CWRU Bearing Dataset

Data was collected for normal bearings, single-point drive end and fan end defects. Data was collected at 12,000 samples/second and at 48,000 samples/second for drive end bearing experiments. All fan end bearing data was collected at 12,000 samples/second.

15 papers1 benchmarks

Exposure-Errors

A dataset of over 24,000 images exhibiting the broadest range of exposure values to date with a corresponding properly exposed image.

15 papers2 benchmarks

PubTables-1M (PubMed Tables One Million)

The goal of PubTables-1M is to create a large, detailed, high-quality dataset for training and evaluating a wide variety of models for the tasks of table detection, table structure recognition, and functional analysis. It contains:

15 papers0 benchmarksImages, Texts

Continual World

Continual World is a benchmark consisting of realistic and meaningfully diverse robotic tasks built on top of Meta-World as a testbed.

15 papers0 benchmarks

LIVE-FB LSVQ (LIVE-FB Large-Scale Social Video Quality (LSVQ) Database)

No-reference (NR) perceptual video quality assessment (VQA) is a complex, unsolved, and important problem to social and streaming media applications. Efficient and accurate video quality predictors are needed to monitor and guide the processing of billions of shared, often imperfect, user-generated content (UGC). Unfortunately, current NR models are limited in their prediction capabilities on real-world, "in-the-wild" UGC video data. To advance progress on this problem, we created the largest (by far) subjective video quality dataset, containing 39, 000 real-world distorted videos and 117, 000 space-time localized video patches ("v-patches"), and 5.5M human perceptual quality annotations. Using this, we created two unique NR-VQA models: (a) a local-to-global region-based NR VQA architecture (called PVQ) that learns to predict global video quality and achieves state-of-the-art performance on 3 UGC datasets, and (b) a first-of-a-kind space-time video quality mapping engine (called PVQ Ma

15 papers3 benchmarksVideos

Montreal Archive of Sleep Studies

The Montreal Archive of Sleep Studies (MASS) is an open-access and collaborative database of laboratory-based polysomnography (PSG) recordings O’Reilly, C., et al. (2014) J Seep Res, 23(6):628-635. Its goal is to provide a standard and easily accessible source of data for benchmarking the various systems developed to help the automation of sleep analysis. It also provides a readily available source of data for fast validation of experimental results and for exploratory analyses. Finally, it is a shared resource that can be used to foster large-scale collaborations in sleep studies.

15 papers9 benchmarksEEG, PSG

MedVidQA (Medical Video Question Answering)

The MedVidQA dataset contains the collection of 3, 010 manually created health-related questions and timestamps as visual answers to those questions from trusted video sources, such as accredited medical schools with an established reputation, health institutes, health education, and medical practitioners.

15 papers0 benchmarksMedical, Texts, Videos

NYT11-HRL

Preprocessed version of NYT11.

15 papers1 benchmarks

20000 utterances

15 papers1 benchmarks

PreviousPage 123 of 1000Next