19,997 machine learning datasets
19,997 dataset results
Dataset consisting of IMU measurements and corresponding SMPL poses. Participants were wearing 17 IMU sensors and reference SMPL poses were obtained by running the SIP optimization with all 17 sensors.
3D AffordanceNet is a dataset of 23k shapes for visual affordance. It consists of 56,307 well-defined affordance information annotations for 22,949 shapes covering 18 affordance classes and 23 semantic object categories.
KolektorSDD2 is a surface-defect detection dataset with over 3000 images containing several types of defects, obtained while addressing a real-world industrial problem.
10 classes with 50, 000 training and 5, 000 testing images. Please note that, in ANIMAL10N, noisy labels were injected naturally by human mistakes, where its noise rate was estimated at 8%.
BugSwarm is a dataset of reproducible faults and fixes to perform experimental evaluation of approaches to software quality. The BugSwarm toolkit has already gathered 3,091 fail-pass pairs, in Java and Python, all packaged within fully reproducible containers.
RoadAnomaly21 is a dataset for anomaly segmentation, the task of identify the image regions containing objects that have never been seen during training. It consists of an evaluation dataset of 100 images with pixel-level annotations. Each image contains at least one anomalous object, e.g. animals or unknown vehicles. The anomalies can appear anywhere in the image and widely differ in size, covering from 0.5% to 40% of the image
MIAP is a dataset created by obtaining a new set of annotations on a subset of the Open Images dataset, containing bounding boxes and attributes for all of the people visible in those images, as the original Open Images dataset annotations are not exhaustive, with bounding boxes and attribute labels for only a subset of the classes in each image.
This dataset includes 4,500 fully annotated images (over 30,000 license plate characters) from 150 vehicles in real-world scenarios where both the vehicle and the camera (inside another vehicle) are moving.
Project CodeNet is a large-scale dataset with approximately 14 million code samples, each of which is an intended solution to one of 4000 coding problems. The code samples are written in over 50 programming languages (although the dominant languages are C++, C, Python, and Java) and they are annotated with a rich set of information, such as its code size, memory footprint, cpu run time, and status, which indicates acceptance or error types. The dataset is accompanied by a repository, where we provide a set of tools to aggregate codes samples based on user criteria and to transform code samples into token sequences, simplified parse trees and other code graphs. A detailed discussion of Project CodeNet is available in this paper.
Everybody Dance Now is a dataset of videos that can be used for training and motion transfer. It contains long single-dancer videos that can be used to train and evaluate the model. All subjects have consented to allowing the data to be used for research purposes.
FaVIQ (Fact Verification from Information-seeking Questions) is a challenging and realistic fact verification dataset that reflects confusions raised by real users. We use the ambiguity in information-seeking questions and their disambiguation, and automatically convert them to true and false claims. These claims are natural, and require a complete understanding of the evidence for verification. FaVIQ serves as a challenging benchmark for natural language understanding, and improves performance in professional fact checking.
Data was collected for normal bearings, single-point drive end and fan end defects. Data was collected at 12,000 samples/second and at 48,000 samples/second for drive end bearing experiments. All fan end bearing data was collected at 12,000 samples/second.
A dataset of over 24,000 images exhibiting the broadest range of exposure values to date with a corresponding properly exposed image.
The goal of PubTables-1M is to create a large, detailed, high-quality dataset for training and evaluating a wide variety of models for the tasks of table detection, table structure recognition, and functional analysis. It contains:
Continual World is a benchmark consisting of realistic and meaningfully diverse robotic tasks built on top of Meta-World as a testbed.
No-reference (NR) perceptual video quality assessment (VQA) is a complex, unsolved, and important problem to social and streaming media applications. Efficient and accurate video quality predictors are needed to monitor and guide the processing of billions of shared, often imperfect, user-generated content (UGC). Unfortunately, current NR models are limited in their prediction capabilities on real-world, "in-the-wild" UGC video data. To advance progress on this problem, we created the largest (by far) subjective video quality dataset, containing 39, 000 real-world distorted videos and 117, 000 space-time localized video patches ("v-patches"), and 5.5M human perceptual quality annotations. Using this, we created two unique NR-VQA models: (a) a local-to-global region-based NR VQA architecture (called PVQ) that learns to predict global video quality and achieves state-of-the-art performance on 3 UGC datasets, and (b) a first-of-a-kind space-time video quality mapping engine (called PVQ Ma
The Montreal Archive of Sleep Studies (MASS) is an open-access and collaborative database of laboratory-based polysomnography (PSG) recordings O’Reilly, C., et al. (2014) J Seep Res, 23(6):628-635. Its goal is to provide a standard and easily accessible source of data for benchmarking the various systems developed to help the automation of sleep analysis. It also provides a readily available source of data for fast validation of experimental results and for exploratory analyses. Finally, it is a shared resource that can be used to foster large-scale collaborations in sleep studies.
The MedVidQA dataset contains the collection of 3, 010 manually created health-related questions and timestamps as visual answers to those questions from trusted video sources, such as accredited medical schools with an established reputation, health institutes, health education, and medical practitioners.
Preprocessed version of NYT11.
20000 utterances