3,275 machine learning datasets
3,275 dataset results
WildLife Documentary is an animal object detection dataset. It contains 15 documentary films that are downloaded from YouTube. The videos vary between 9 minutes to as long as 50 minutes, with resolution ranging from 360p to 1080p. A unique property of this dataset is that all videos are accompanied with subtitles that are automatically generated from speech by YouTube. The subtitles are revised manually to correct obvious spelling mistakes. All the animals in the videos are annotated, resulting in more than 4098 object tracklets of 60 different visual concepts, e.g., ‘tiger’, ‘koala’, ‘langur’, and ‘ostrich’.
CUHK Image Cropping is a dataset for image cropping. The photos are of varying aesthetic quality and span a variety of image categories, including animal, architecture, human, landscape, night, plant and man-made objects. Each image is manually cropped by three expert photographers (graduate students in art whose primary medium is photography) to form three training sets. There are 1,000 photos in the dataset.
Social Relation Dataset is a dataset for social relation trait prediction from face images. Traits are based on the interpersonal circle proposed by Kiesler, where human relations are divided into 16 segments. Each segment has its opposite side in the circle, such as 'friendly and hostile'. The dataset contains 8,306 images chosen from the internet and movies. Each image is labelled with faces’ bounding boxes and their pairwise relations.
Pavia Centre is a hyperspectral dataset acquired by the ROSIS sensor during a flight campaign over Pavia, northern Italy. The number of spectral bands is 102 for Pavia Centre. Pavia Centre is a 1096*1096 pixels image. The geometric resolution is 1.3 meters. Image groundtruths differentiate 9 classes each. Pavia scenes were provided by Prof. Paolo Gamba from the Telecommunications and Remote Sensing Laboratory, Pavia university (Italy).
The Multimodal Document Intent Dataset (MDID) is a dataset for computing author intent from multimodal data from Instagram. It contains 1,299 Instagram posts covering a variety of topics, annotated with labels from three taxonomies. The samples are labelled with 7 labels of intent: Provocative, Informative, Advocative, Entertainment, Expositive, Expressive, Promotive
ADE-Affordance is a new dataset that builds upon ADE20k, which contains annotations enabling such rich visual reasoning.
Large Age-Gap (LAG) is a dataset for face verification, The dataset contains 3,828 images of 1,010 celebrities. For each identity at least one child/young image and one adult/old image are present.
WHU-Specular is a large dataset of annotated specular highlight regions created from real-world images. It can be used for specular highlight detection task. It contains 4310 image pairs (specular images and corresponding highlight masks). We randomly selected 3,017 images as the training set, and other 1293 images as the testing set.
The RailEye3D dataset, a collection of train-platform scenarios for applications targeting passenger safety and automation of train dispatching, consists of 10 image sequences captured at 6 railway stations in Austria. Annotations for multi-object tracking are provided in both an unified format as well as the ground-truth format used in the MOTChallenge.
The “Medico automatic polyp segmentation challenge” aims to develop computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps (for example, irregular polyp, smaller or flat polyps) with high efficiency and accuracy. The main goal of the challenge is to benchmark semantic segmentation algorithms on a publicly available dataset, emphasizing robustness, speed, and generalization.
The UASOL an RGB-D stereo dataset, that contains 160902 frames, filmed at 33 different scenes, each with between 2 k and 10 k frames. The frames show different paths from the perspective of a pedestrian, including sidewalks, trails, roads, etc. The images were extracted from video files with 15 fps at HD2K resolution with a size of 2280 × 1282 pixels. The dataset also provides a GPS geolocalization tag for each second of the sequences and reflects different climatological conditions. It also involved up to 4 different persons filming the dataset at different moments of the day.
Digital Peter is a dataset of Peter the Great's manuscripts annotated for segmentation and text recognition. The dataset may be useful for researchers to train handwriting text recognition models as a benchmark for comparing different models. It consists of 9,694 images and text files corresponding to lines in historical documents. The dataset includes Peter’s handwritten materials covering the period from 1709 to 1713.
OnFocus Detection In the Wild (OFDIW) is an onfocus detection dataset. It consists of 20,623 images in unconstrained capture conditions (thus called "in the wild'') and contains individuals with diverse emotions, ages, facial characteristics, and rich interactions with surrounding objects and background scenes. The images are collected from the LFW dataset and the Oxford-IIIT Pet dataset. Onfocus detection aims at identifying whether the focus of the individual captured by a camera is on the camera or not.
Mirrored-Human is a dataset for 3D pose estimation from a single view. It covers a large variety of human subjects, poses and backgrounds. The images are collected from the internet and consists of people in front of mirrors, were both the person and the reflected image are visible. Actions cover dancing, fitness, mirror installation, swing practice
FixMyPose is a dataset for automated pose correction. It consists of descriptions to correct a "current" pose to look like a "target" pose, in English and Hindi. The collected descriptions have interesting linguistic properties such as egocentric relations to environment objects, analogous references, etc., requiring an understanding of spatial relations and commonsense knowledge about postures.
ElBa is composed of procedurally-generated realistic renderings, where we vary in a continuous way element shapes and colors and their distribution, to generate 30K texture images with different local symmetry, stationarity, and density of (3M) localized texels, whose attributes are thus known by construction.
A set of patterns used in psychophysical research to evaluate the ability of saliency algorithms to find targets distinct from distractors in orientation, color and size. Each image is a 7x7 grid and contains a single target. All images are 1024x1024px and have corresponding ground truth masks for the target and distractors.
The dataset contains a Video capsule endoscopy dataset for polyp segmentation.
The Person In Context (PIC) dataset is a dataset for human-centric relation segmentation (HRS), which contains 17,122 high-resolution images and densely annotated entity segmentation and relations, including 141 object categories, 23 relation categories and 25 semantic human parts.
CASIA-Face-Africa is a face image database which contains 38,546 images of 1,183 African subjects. Multi-spectral cameras are utilized to capture the face images under various illumination settings. Demographic attributes and facial expressions of the subjects are also carefully recorded. For landmark detection, each face image in the database is manually labeled with 68 facial keypoints. A group of evaluation protocols are constructed according to different applications, tasks, partitions and scenarios. The proposed database along with its face landmark annotations, evaluation protocols and preliminary results form a good benchmark to study the essential aspects of face biometrics for African subjects, especially face image preprocessing, face feature analysis and matching, facial expression recognition, sex/age estimation, ethnic classification, face image generation, etc.