19,997 machine learning datasets
19,997 dataset results
Instructions: https://krr-oxford.github.io/DeepOnto/ontolama/.
YTD-18M is a large-scale corpus of 18M video-based dialogues, constructed from web videos: crucial to the data collection pipeline is a pretrained language model that converts error-prone automatic transcripts to a cleaner dialogue format while maintaining meaning.
GAS (Grasp Area Segmentation) dataset consists of 10089 RGB images of cluttered scenes grouped into 1121 grasp-area segmentation tasks. For each RGB image we provide a binary segmentation map with the graspable and non-graspable regions for every object in the scene. The dataset can be used for meta-training part-based grasp area estimation networks.
From Grounded Human-Object Interaction Hotspots from Video (ICCV'19): We collect annotations for interaction keypoints on EPIC Kitchens in order to quantitatively evaluate our method in parallel to the OPRA dataset (where annotations are available). We note that these annotations are collected purely for evaluation, and are not used for training our model. We select the 20 most frequent verbs, and select 31 nouns that afford these interactions.
A pool of real stocks from S&P 500 for recent 21 years from 01/02/2000 to 12/31/2020. We filter stocks that have missing data throughout the whole period, resulting in 150 stocks with 5,284 trading days.
The archive contains original images from U2OS cells stained with Hoechst 33342 as PNG files. It also contains images (as Photoshop and GIMP files) showing hand-segmentation of the Hoechst images into regions containing single nuclei.
This is a set of files representing part of the workload of Microsoft's Azure Functions offering, collected in July of 2019. This dataset is a subset of the data described in, and analyzed, in the USENIX ATC 2020 paper 'Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider'.
Student Classroom Behavior dataset (SCB-dataset) reflects real-life scenarios. The dataset includes 11,248 labels and 4,003 images, with a focus on handraising behavior.
RoboPianist is a benchmarking suite for high-dimensional control, targeted at testing high spatial and temporal precision, coordination, and planning, all with an underactuated system frequently making-and-breaking contacts. The proposed challenge is mastering the piano through bi-manual dexterity, using a pair of simulated anthropomorphic robot hands. The initial version covers a broad set of 150 variable-difficulty songs.
RoboBEV is a robustness evaluation benchmark tailored for camera-based bird's eye view (BEV) perception under natural data corruptions and domain shift. It includes eight distinct corruptions, including Bright, Dark, Fog, Snow, Motion Blur, Color Quant, Camera Crash, and Frame Lost.
Amateur Drawings is a dataset collected via the public demo of Animated Drawings, containing over 178,000 amateur drawings and corresponding user-accepted character bounding boxes, segmentation masks, and joint location annotations.
The Five-Billion-Pixels dataset contains more than 5 billion labeled pixels of 150 high-resolution Gaofen-2 (4 m) satellite images, annotated in a 24-category system covering artificial-constructed, agricultural, and natural classes. It possesses the advantage of rich categories, large coverage, wide distribution, and high-spatial resolution, which well reflects the distributions of real-world ground objects and can benefit to different land cover related studies.
LayoutBench is a diagnostic benchmark that examines 4 spatial control skills (number, position, size, shape), where each skill consists of 2 OOD layout splits, i.e., in total 8 tasks = 4 skills x 2 splits. To disentangle spatial control from other aspects of image generation, such as generating diverse objects, LayoutBench keeps the object configurations of CLEVR, and changes the spatial layouts.
Multi-view image dataset of seven objects under indoor lighting, for the purpose of multi-view 3D reconstruction and inverse rendering. Around half images are taken under indoor environment lighting only, and the other half are also under a flashlight co-located with camera centre. The co-located flashlight images are for material/BSDF reconstruction.
MIMIC-IV ICD-10 contains 122,279 discharge summaries—free-text medical documents—annotated with ICD-10 diagnosis and procedure codes. It contains data for patients admitted to the Beth Israel Deaconess Medical Center emergency department or ICU between 2008-2019. All codes with fewer than ten examples have been removed, and the train-val-test split was created using multi-label stratified sampling. The dataset is described further in Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study, and the code to use the dataset is found here.
In this work, we propose a novel remote sensing dataset, FireRisk, consisting of 7 fire risk classes with a total of 91 872 labelled images for fire risk assessment. This remote sensing dataset is labelled with the fire risk classes supplied by the Wildfire Hazard Potential (WHP) raster dataset, and remote sensing images are collected using the National Agriculture Imagery Program (NAIP), a high-resolution remote sensing imagery program. On FireRisk, we present benchmark performance for supervised and self-supervised representations, with Masked Autoencoders (MAE) pre-trained on ImageNet1k achieving the highest classification accuracy, 65.29%.
A large-scale cross-lingual dataset for entity alignment
We introduce a large-scale image dataset EasyPortrait for portrait segmentation and face parsing. Proposed dataset can be used in several tasks, such as background removal in conference applications, teeth whitening, face skin enhancement, red eye removal or eye colorization, and so on.
The MIMIC-IV-ICD10-full dataset, including occurring labels.
The MIMIC-IV-ICD9 dataset, including all occurring labels.