19,997 machine learning datasets
19,997 dataset results
The LC25000 dataset contains 25,000 color images with 5 classes of 5,000 images each. All images are 768 x 768 pixels in size and are in jpeg file format. The 5 classes are: colon adenocarcinomas, benign colonic tissues, lung adenocarcinomas, lung squamous cell carcinomas and bening lung tissues.
OpenEDS (Open Eye Dataset) is a large scale data set of eye-images captured using a virtual-reality (VR) head mounted display mounted with two synchronized eyefacing cameras at a frame rate of 200 Hz under controlled illumination. This dataset is compiled from video capture of the eye-region collected from 152 individual participants and is divided into four subsets: (i) 12,759 images with pixel-level annotations for key eye-regions: iris, pupil and sclera (ii) 252,690 unlabelled eye-images, (iii) 91,200 frames from randomly selected video sequence of 1.5 seconds in duration and (iv) 143 pairs of left and right point cloud data compiled from corneal topography of eye regions collected from a subset, 143 out of 152, participants in the study.
PhraseCut is a dataset consisting of 77,262 images and 345,486 phrase-region pairs. The dataset is collected on top of the Visual Genome dataset and uses the existing annotations to generate a challenging set of referring phrases for which the corresponding regions are manually annotated.
A large scale dataset with daily-living activities performed in a natural manner.
The Image Paragraph Captioning dataset allows researchers to benchmark their progress in generating paragraphs that tell a story about an image. The dataset contains 19,561 images from the Visual Genome dataset. Each image contains one paragraph. The training/val/test sets contains 14,575/2,487/2,489 images.
ArtEmis is a large-scale dataset aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language. In contrast to most existing annotation datasets in computer vision, this dataset focuses on the affective experience triggered by visual artworks an the annotators were asked to indicate the dominant emotion they feel for a given image and, crucially, to also provide a grounded verbal explanation for their emotion choice. This leads to a rich set of signals for both the objective content and the affective impact of an image, creating associations with abstract concepts (e.g., “freedom” or “love”), or references that go beyond what is directly visible, including visual similes and metaphors, or subjective references to personal experiences.
Under Institutional Review Board (IRB) supervision, 50 abdomen CT scans of were randomly selected from a combination of an ongoing colorectal cancer chemotherapy trial, and a retrospective ventral hernia study. The 50 scans were captured during portal venous contrast phase with variable volume sizes (512 x 512 x 85 - 512 x 512 x 198) and field of views (approx. 280 x 280 x 280 mm3 - 500 x 500 x 650 mm3). The in-plane resolution varies from 0.54 x 0.54 mm2 to 0.98 x 0.98 mm2, while the slice thickness ranges from 2.5 mm to 5.0 mm. The standard registration data was generated by NiftyReg.
The Actor-Action Dataset (A2D) by Xu et al. [29] serves as the largest video dataset for the general actor and action segmentation task. It contains 3,782 videos from YouTube with pixel-level labeled actors and their actions. The dataset includes eight different actions, while a total of seven actor classes are considered to perform those actions. We follow [29], who split the dataset into 3,036 training videos and 746 testing videos.
The Evolved Grasping Analysis Dataset (EGAD) comprises over 2000 generated objects aimed at training and evaluating robotic visual grasp detection algorithms. The objects in EGAD are geometrically diverse, filling a space ranging from simple to complex shapes and from easy to difficult to grasp, compared to other datasets for robotic grasping, which may be limited in size or contain only a small number of object classes.
The SemEval-2013 Task 2 dataset contains data for two subtasks: A, an expression-level subtask, and B, a message-level subtask. Crowdsourcing was used to label a large Twitter training dataset along with additional test sets of Twitter and SMS messages for both subtasks.
The DQN Replay Dataset was collected as follows: We first train a DQN agent, on all 60 Atari 2600 games with sticky actions enabled for 200 million frames (standard protocol) and save all of the experience tuples of (observation, action, reward, next observation) (approximately 50 million) encountered during training.
The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample.
It contains manually verified 183K question-answer pairs about more than 18K persons and 24K images. The questions in this dataset require multi-entity, multi-relation and multi-hop reasoning over KG to arrive at an answer. To enable visual named entity linking, it also provides a support set containing reference images of 69K persons harvested from Wikidata as part of the dataset.
The Image-Grounded Language Understanding Evaluation (IGLUE) benchmark brings together—by both aggregating pre-existing datasets and creating new ones—visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. The benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups.
ContractNLI is a dataset for document-level natural language inference (NLI) on contracts whose goal is to automate/support a time-consuming procedure of contract review. In this task, a system is given a set of hypotheses (such as “Some obligations of Agreement may survive termination.”) and a contract, and it is asked to classify whether each hypothesis is entailed by, contradicting to or not mentioned by (neutral to) the contract as well as identifying evidence for the decision as spans in the contract.
We introduce ACDC, the Adverse Conditions Dataset with Correspondences for training and testing semantic segmentation methods on adverse visual conditions. It comprises a large set of 4006 images which are evenly distributed between fog, nighttime, rain, and snow. Each adverse-condition image comes with a high-quality fine pixel-level semantic annotation, a corresponding image of the same scene taken under normal conditions and a binary mask that distinguishes between intra-image regions of clear and uncertain semantic content.
The MSLR-WEB30K dataset consists of 30,000 search queries over the documents from search results. The data also contains the values of 136 features and a corresponding user-labeled relevance factor on a scale of one to five with respect to each query-document pair.
MAFW is a large-scale, multi-modal, compound affective database for dynamic facial expression recognition in the wild. It contains 10,045 video-audio clips, annotated with a compound emotional category and a couple of sentences that describe the subjects' affective behaviors in the clip. For the compound emotion annotation, each clip is categorized into one or more of the 11 widely-used emotions, i.e., anger, disgust, fear, happiness, neutral, sadness, surprise, contempt, anxiety, helplessness, and disappointment.
OpinionQA is a dataset for evaluating the alignment of LM opinions with those of 60 US demographic groups over topics ranging from abortion to automation.
PMC-OA is a large-scale dataset that contains 1.65M image-text pairs. The figures and captions from PubMed Central, 2,478,267 available papers are covered and 12,211,907 figure-caption pairs are extracted.