Datasets

3,275 machine learning datasets

3,275 dataset results

WLD (WildLife Documentary)

WildLife Documentary is an animal object detection dataset. It contains 15 documentary films that are downloaded from YouTube. The videos vary between 9 minutes to as long as 50 minutes, with resolution ranging from 360p to 1080p. A unique property of this dataset is that all videos are accompanied with subtitles that are automatically generated from speech by YouTube. The subtitles are revised manually to correct obvious spelling mistakes. All the animals in the videos are annotated, resulting in more than 4098 object tracklets of 60 different visual concepts, e.g., ‘tiger’, ‘koala’, ‘langur’, and ‘ostrich’.

3 papers0 benchmarksImages

CUHK Image Cropping

CUHK Image Cropping is a dataset for image cropping. The photos are of varying aesthetic quality and span a variety of image categories, including animal, architecture, human, landscape, night, plant and man-made objects. Each image is manually cropped by three expert photographers (graduate students in art whose primary medium is photography) to form three training sets. There are 1,000 photos in the dataset.

3 papers0 benchmarksImages

Social Relation Dataset

Social Relation Dataset is a dataset for social relation trait prediction from face images. Traits are based on the interpersonal circle proposed by Kiesler, where human relations are divided into 16 segments. Each segment has its opposite side in the circle, such as 'friendly and hostile'. The dataset contains 8,306 images chosen from the internet and movies. Each image is labelled with faces’ bounding boxes and their pairwise relations.

3 papers0 benchmarksImages

Pavia Centre

Pavia Centre is a hyperspectral dataset acquired by the ROSIS sensor during a flight campaign over Pavia, northern Italy. The number of spectral bands is 102 for Pavia Centre. Pavia Centre is a 1096*1096 pixels image. The geometric resolution is 1.3 meters. Image groundtruths differentiate 9 classes each. Pavia scenes were provided by Prof. Paolo Gamba from the Telecommunications and Remote Sensing Laboratory, Pavia university (Italy).

3 papers0 benchmarksHyperspectral images, Images

MDID (Multimodal Document Intent Dataset)

The Multimodal Document Intent Dataset (MDID) is a dataset for computing author intent from multimodal data from Instagram. It contains 1,299 Instagram posts covering a variety of topics, annotated with labels from three taxonomies. The samples are labelled with 7 labels of intent: Provocative, Informative, Advocative, Entertainment, Expositive, Expressive, Promotive

3 papers0 benchmarksImages, Texts

ADE-Affordance

ADE-Affordance is a new dataset that builds upon ADE20k, which contains annotations enabling such rich visual reasoning.

3 papers0 benchmarksImages, Texts

Large Age-Gap

Large Age-Gap (LAG) is a dataset for face verification, The dataset contains 3,828 images of 1,010 celebrities. For each identity at least one child/young image and one adult/old image are present.

3 papers0 benchmarksImages

WHU-Specular dataset

WHU-Specular is a large dataset of annotated specular highlight regions created from real-world images. It can be used for specular highlight detection task. It contains 4310 image pairs (specular images and corresponding highlight masks). We randomly selected 3,017 images as the training set, and other 1293 images as the testing set.

3 papers0 benchmarksImages

RailEye3D Dataset

The RailEye3D dataset, a collection of train-platform scenarios for applications targeting passenger safety and automation of train dispatching, consists of 10 image sequences captured at 6 railway stations in Austria. Annotations for multi-object tracking are provided in both an unified format as well as the ground-truth format used in the MOTChallenge.

3 papers0 benchmarksImages

Medico automatic polyp segmentation challenge (dataset)

The “Medico automatic polyp segmentation challenge” aims to develop computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps (for example, irregular polyp, smaller or flat polyps) with high efficiency and accuracy. The main goal of the challenge is to benchmark semantic segmentation algorithms on a publicly available dataset, emphasizing robustness, speed, and generalization.

3 papers5 benchmarksBiomedical, Images, Medical

UASOL (A large-scale high-resolution outdoor stereo dataset)

The UASOL an RGB-D stereo dataset, that contains 160902 frames, filmed at 33 different scenes, each with between 2 k and 10 k frames. The frames show different paths from the perspective of a pedestrian, including sidewalks, trails, roads, etc. The images were extracted from video files with 15 fps at HD2K resolution with a size of 2280 × 1282 pixels. The dataset also provides a GPS geolocalization tag for each second of the sequences and reflects different climatological conditions. It also involved up to 4 different persons filming the dataset at different moments of the day.

3 papers2 benchmarksImages, RGB-D, Stereo

Digital Peter

Digital Peter is a dataset of Peter the Great's manuscripts annotated for segmentation and text recognition. The dataset may be useful for researchers to train handwriting text recognition models as a benchmark for comparing different models. It consists of 9,694 images and text files corresponding to lines in historical documents. The dataset includes Peter’s handwritten materials covering the period from 1709 to 1713.

3 papers2 benchmarksImages

OFDIW (OnFocus Detection In the Wild)

OnFocus Detection In the Wild (OFDIW) is an onfocus detection dataset. It consists of 20,623 images in unconstrained capture conditions (thus called "in the wild'') and contains individuals with diverse emotions, ages, facial characteristics, and rich interactions with surrounding objects and background scenes. The images are collected from the LFW dataset and the Oxford-IIIT Pet dataset. Onfocus detection aims at identifying whether the focus of the individual captured by a camera is on the camera or not.

3 papers0 benchmarksImages

Mirrored-Human

Mirrored-Human is a dataset for 3D pose estimation from a single view. It covers a large variety of human subjects, poses and backgrounds. The images are collected from the internet and consists of people in front of mirrors, were both the person and the reflected image are visible. Actions cover dancing, fitness, mirror installation, swing practice

3 papers0 benchmarksImages

FixMyPose

FixMyPose is a dataset for automated pose correction. It consists of descriptions to correct a "current" pose to look like a "target" pose, in English and Hindi. The collected descriptions have interesting linguistic properties such as egocentric relations to environment objects, analogous references, etc., requiring an understanding of spatial relations and commonsense knowledge about postures.

3 papers0 benchmarksImages, Texts

ElBa (ElBa: Element Based Textures Dataset)

ElBa is composed of procedurally-generated realistic renderings, where we vary in a continuous way element shapes and colors and their distribution, to generate 30K texture images with different local symmetry, stationarity, and density of (3M) localized texels, whose attributes are thus known by construction.

3 papers0 benchmarksImages

P3 (Psychophysical Patterns Dataset)

A set of patterns used in psychophysical research to evaluate the ability of saliency algorithms to find targets distinct from distractors in orientation, color and size. Each image is a 7x7 grid and contains a single target. All images are 1024x1024px and have corresponding ground truth masks for the target and distractors.

3 papers0 benchmarksImages, Texts

KvasirCapsule-SEG

The dataset contains a Video capsule endoscopy dataset for polyp segmentation.

3 papers2 benchmarksBiomedical, Cad, Images, Medical

PIC (Person In Context 2021)

The Person In Context (PIC) dataset is a dataset for human-centric relation segmentation (HRS), which contains 17,122 high-resolution images and densely annotated entity segmentation and relations, including 141 object categories, 23 relation categories and 25 semantic human parts.

3 papers0 benchmarksImages

CASIA-Face-Africa

CASIA-Face-Africa is a face image database which contains 38,546 images of 1,183 African subjects. Multi-spectral cameras are utilized to capture the face images under various illumination settings. Demographic attributes and facial expressions of the subjects are also carefully recorded. For landmark detection, each face image in the database is manually labeled with 68 facial keypoints. A group of evaluation protocols are constructed according to different applications, tasks, partitions and scenarios. The proposed database along with its face landmark annotations, evaluation protocols and preliminary results form a good benchmark to study the essential aspects of face biometrics for African subjects, especially face image preprocessing, face feature analysis and matching, facial expression recognition, sex/age estimation, ethnic classification, face image generation, etc.

3 papers0 benchmarksImages

PreviousPage 83 of 164Next