19,997 machine learning datasets
19,997 dataset results
The PIRM dataset consists of 200 images, which are divided into two equal sets for validation and testing. These images cover diverse contents, including people, objects, environments, flora, natural scenery, etc. Images vary in size, and are typically ~300K pixels in resolution.
PUBHEALTH is a comprehensive dataset for explainable automated fact-checking of public health claims. Each instance in the PUBHEALTH dataset has an associated veracity label (true, false, unproven, mixture). Furthermore each instance in the dataset has an explanation text field. The explanation is a justification for which the claim has been assigned a particular veracity label.
UT Zappos50K is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The images are divided into 4 major categories — shoes, sandals, slippers, and boots — followed by functional types and individual brands. The shoes are centered on a white background and pictured in the same orientation for convenient analysis.
Worldtree is a corpus of explanation graphs, explanatory role ratings, and associated tablestore. It contains explanation graphs for 1,680 questions, and 4,950 tablestore rows across 62 semi-structured tables are provided. This data is intended to be paired with the AI2 Mercury Licensed questions.
Consists of (i) five relevant responses for each context and (ii) five adversarially crafted irrelevant responses for each context.
TinyFace is a large scale face recognition benchmark to facilitate the investigation of natively LRFR (Low Resolution Face Recognition) at large scales (large gallery population sizes) in deep learning. The TinyFace dataset consists of 5,139 labelled facial identities given by 169,403 native LR face images (average 20×16 pixels) designed for 1:N recognition test. All the LR faces in TinyFace are collected from public web data across a large variety of imaging scenarios, captured under uncontrolled viewing conditions in pose, illumination, occlusion and background.
CholecT50 is a dataset of endoscopic videos of laparoscopic cholecystectomy surgery introduced to enable research on fine-grained action recognition in laparoscopic surgery. It is annotated with triplet information in the form of <instrument, verb, target>. The dataset is a collection of 50 videos consisting of 45 videos from the Cholec80 dataset and 5 videos from an in-house dataset of the same surgical procedure.
Ego4D is a massive-scale egocentric video dataset and benchmark suite. It offers 3,025 hours of daily life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, a host of new benchmark challenges are presented, centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversatio
ModelNet40-C is a comprehensive dataset to benchmark the corruption robustness of 3D point cloud recognition.
V2X-Sim, short for vehicle-to-everything simulation, is the a synthetic collaborative perception dataset in autonomous driving developed by AI4CE Lab at NYU and MediaBrain Group at SJTU to facilitate collaborative perception between multiple vehicles and roadside infrastructure. Data is collected from both roadside and vehicles when they are presented near the same intersection. With information from both the roadside infrastructure and vehicles, the dataset aims to encourage research on collaborative perception tasks.
This data is for the task of named entity recognition and linking/disambiguation over tweets. It comprises the addition of an entity URI layer on top of an NER-annotated tweet dataset. The task is to detect entities and then provide a correct link to them in DBpedia, thus disambiguating otherwise ambiguous entity surface forms; for example, this means linking "Paris" to the correct instance of a city named that (e.g. Paris, France vs. Paris, Texas).
The goal of NICO Challenge is to facilitate the OOD (Out-of-Distribution) generalization in visual recognition through promoting the research on the intrinsic learning mechanisms with native invariance and generalization ability. The training data is a mixture of several observed contexts while the test data is composed of unseen contexts. Participants are tasked with developing reliable algorithms across different contexts (domains) to improve the generalization ability of models.
Motivation Multi-object tracking (MOT) is a fundamental task in computer vision, aiming to estimate objects (e.g., pedestrians and vehicles) bounding boxes and identities in video sequences.
The MS-CXR dataset provides 1162 image–sentence pairs of bounding boxes and corresponding phrases, collected across eight different cardiopulmonary radiological findings, with an approximately equal number of pairs for each finding. This dataset complements the existing MIMIC-CXR v.2 dataset and comprises: 1. Reviewed and edited bounding boxes and phrases (1026 pairs of bounding box/sentence); and 2. Manual bounding box labels from scratch (136 pairs of bounding box/sentence).e
We introduce our new dataset, Spaces, to provide a more challenging shared dataset for future view synthesis research. Spaces consists of 100 indoor and outdoor scenes, captured using a 16-camera rig. For each scene, we captured image sets at 5-10 slightly different rig positions (within ∼10cm of each other). This jittering of the rig position provides a flexible dataset for view synthesis, as we can mix views from different rig positions for the same scene during training. We calibrated the intrinsics and the relative pose of the rig cameras using a standard structure from motion approach, using the nominal rig layout as a prior. We corrected exposure differences . For our main experiments we undistort the images and downsample them to a resolution of 800 × 480. We use 90 scenes from the dataset for training and hold out 10 for evaluation.
Parts and Attributes of Common Objects (PACO) is a detection dataset that goes beyond traditional object boxes and masks and provides richer annotations such as part masks and attributes. It spans 75 object categories, 456 object-part categories and 55 attributes across image (LVIS) and video (Ego4D) datasets. The dataset contains 641K part masks annotated across 260K object boxes, with half of them exhaustively annotated with attributes as well.
MVTec Logical Constraints Anomaly Detection (MVTec LOCO AD) dataset is intended for the evaluation of unsupervised anomaly localization algorithms. The dataset includes both structural and logical anomalies. It contains 3644 images from five different categories inspired by real-world industrial inspection scenarios. Structural anomalies appear as scratches, dents, or contaminations in the manufactured products. Logical anomalies violate underlying constraints, e.g., a permissible object being present in an invalid location or a required object not being present at all. The dataset also includes pixel-precise ground truth data for each anomalous region.
Dress Code is a new dataset for image-based virtual try-on composed of image pairs coming from different catalogs of YOOX NET-A-PORTER. The dataset contains more than 50k high resolution model clothing images pairs divided into three different categories (i.e. dresses, upper-body clothes, lower-body clothes).
A large-scale VIdeo Panoptic Segmentation dataset
Current benchmarks for facial expression recognition (FER) mainly focus on static images, while there are limited datasets for FER in videos. It is still ambiguous to evaluate whether performances of existing methods remain satisfactory in real-world application-oriented scenes. For example, the “Happy” expression with high intensity in Talk-Show is more discriminating than the same expression with low intensity in Official-Event. To fill this gap, we build a large-scale multi-scene dataset, coined as FERV39k. We analyze the important ingredients of constructing such a novel dataset in three aspects: (1) multi-scene hierarchy and expression class, (2) generation of candidate video clips, (3) trusted manual labelling process. Based on these guidelines, we select 4 scenarios subdivided into 22 scenes, annotate 86k samples automatically obtained from 4k videos based on the welldesigned workflow, and finally build 38,935 video clips labeled with 7 classic expressions. Experiment benchmarks