19,997 machine learning datasets
19,997 dataset results
The CLOUD dataset is a set of Optical Coherence Tomography of the Anterior Segment images (AS-OCT) used to the automatic identification and representation of the cornea-contact lens relationship. The dataset includes 112 AS-OCT images that were captured from 16 different patients. In particular, the images were obtained by an OCT Cirrus 500 scanner model of Carl Zeiss Meditec with an anterior segment module for users of scleral contact lens (SCL).
UDA-CH contains 16 objects that cover a variety of artworks which can be found in a museum like sculptures, paintings and books. Specifically, the dataset has been collected inside the cultural site “Galleria Regionale di Palazzo Bellomo” located in Siracusa, Italy.
EGO-CH is a dataset of egocentric videos for visitors’ behavior understanding. The dataset has been collected in two different cultural sites and includes more than 27 hours of video acquired by 70 subjects, including volunteers and 60 real visitors. The overall dataset includes labels for 26 environments and over 200 Points of Interest (POIs). Specifically, each video of EGO-CH has been annotated with 1) temporal labels specifying the current location of the visitor and the observed POI, 2) bounding box annotations around POIs. A large subset of the dataset, consisting of 60 videos, is also associated with surveys filled out by the visitors at the end of each visit.
The combinatorial 3D shape dataset is composed of 406 instances of 14 classes. Specifically, each object in the dataset is considered equivalent to a sequence of primitive placement.
Advanced pixel shift technology is employed to perform a full color sampling of the image. Pixel shift technology takes four samples of the same image at nearly the same time, and physically controls the camera sensor to move one pixel horizontally or vertically at each sampling to capture all color information at each pixel. The pixel shift technology ensures that the sampled images follow the distribution of natural images sampled by the camera, and the full information of the color (R, Gr, Gb, B channel) is completely obtained without any need of interpolation. In this way, the collected RGB images are artifacts-free, which leads to better training results for demosaicing related tasks.
The satire dataset is a new multi-modal dataset of satirical and regular news articles. The satirical news is collected from four websites that explicitly declare themselves to be satire, and the regular news is collected from six mainstream news websites. Specifically, the satirical news websites the articles were collected from are The Babylon Bee, Clickhole, Waterford Whisper News, and The DailyER. The regular news websites are Reuters, The Hill, Politico, New York Post, Huffington Post, and Vice News. The headlines and the thumbnail images of the latest 1000 articles for each of the publications are collected. The dataset contains a total of 4000 satirical and 6000 regular news articles.
Facial electromyography recordings during both silent and vocalized speech.
The SMOT dataset, Single sequence-Multi Objects Training, is collected to represent a practical scenario of collecting training images of new objects in the real world, i.e. a mobile robot with an RGB-D camera collects a sequence of frames while driving around a table to learning multiple objects and tries to recognize objects in different locations.
NText is an eight million words dataset extracted and preprocessed from nuclear research papers and thesis.
The dfd_indoor dataset contains 110 images for training and 29 images for testing. The dfd_outdoor dataset contains 34 images for tests; no ground truth was given for this dataset, as the depth sensor only works on indoor scenes.
MlGesture is a dataset for hand gesture recognition tasks, recorded in a car with 5 different sensor types at two different viewpoints. The dataset contains over 1300 hand gesture videos from 24 participants and features 9 different hand gesture symbols. One sensor cluster with five different cameras is mounted in front of the driver in the center of the dashboard. A second sensor cluster is mounted on the ceiling looking straight down.
Includes two datasets for this task, one for English-French (En-Fr) and another for English-German (En-De). For each dataset, the action sequences for full documents are provided, along with an editor identifier. The dataset contains document-level post-editing action sequences, including edit operations from keystrokes, mouse actions, and waiting times.
METU-VIREF is a video referring expression dataset comprising of videos from VIRAT Ground and ILSVRC2015 VID datasets. VIRAT is a surveillance dataset and contains mainly people and vehicles. To line up with this and restrict the domain, only videos that contain vehicles from the ILSVRC dataset are used. The METU-VIREF dataset does not contain whole videos from these datasets (the videos need to be downloaded from the respective sources) but just referring expressions for video sequences containing an object pair. For this, object pairs are chosen which had a relation that a meaningful referring expression could be written for.
Food.com Recipes and Interactions consists of 270K recipes and 1.4M user-recipe interactions (reviews) scraped from Food.com, covering a period of 18 years (January 2000 to December 2018).
SYNTHIA-PANO is the panoramic version of SYNTHIA dataset. Five sequences are included: Seqs02-summer, Seqs02-fall, Seqs04-summer, Seqs04-fall and Seqs05-summer. Panomaramic images with fine annotation for semantic segmentation.
CSAbstruct is a new dataset of annotated computer science abstracts with sentence labels according to their rhetorical roles. The key difference between this dataset and PUBMED-RCT is that PubMed abstracts are written according to a predefined structure, whereas computer science papers are free-form. Therefore, there is more variety in writing styles in CSABSTRUCT. CSABSTRUCT is collected from the Semantic Scholar corpus (Ammar et al., 2018). Each sentence is annotated by 5 workers on the Figure-eight platform,6 with one of 5 categories {BACKGROUND, OBJECTIVE, METHOD, RESULT, OTHER}.
Pesteh-Set is made of two parts. The first part includes 423 images with ground truth. The pistachios are sorted into two classes: Open-mouth and closed-mouth. The ground truth of the images is a CSV file that consists of the bounding boxes of the two classes of pistachios in the images. There are between 1 to 27 pistachios in each image, and 3927 pistachios in total. The second part includes 6 videos with a total length of 167 seconds and 561 moving pistachios.
CelebAGaze consists of 25283 high-resolution celebrity images that are collected from CelebA and the Internet. It consists of 21832 face images with eyes staring at the camera and 3451 face images with eyes staring somewhere else. All images (256 × 256) are cropped and the eye mask region by dlib is computed. Specifically, dlib is used to extract 68 facial landmarks and calculate the mean of 6 points near the eye region, which will be the center point of the mask. The size of the mask is fixed to 30×50. As described above, 300 samples from domain Y are randomly selected, 100 samples from domain X as the test set, the remaining as the training set. Note that this dataset is unpaired and it is not labeled with the specific eye angle or the head pose information.
The SOLO Corpus comprises over 4 million English tweets, each of which contains at least one of the following tokens: solitude, lonely, and loneliness. The corpus has been collected to analyze the language and emotions associated with the state of being alone in English tweets.
The EXPO-HD Dataset is a dataset of Expo whiteboard markers for the purpose of instance segmentation. The dataset contains two subsets (both include instances segmentation labels):