3,275 machine learning datasets
3,275 dataset results
The Situated Corpus Of Understanding Transactions (SCOUT) is a multi-modal collection of human-robot dialogue in the task domain of collaborative exploration. The corpus was constructed from multi-phased Wizard-of-Oz experiments where human participants gave verbal instructions to a remotely-located robot to move and gather information about its surroundings. Each dialogue involved a human Commander, a Dialogue Manager (DM), and a Robot Navigator (RN), and took place in physical or simulated environments.
TexBiG (from the German Text-Bild-Gefüge, meaning Text-Image-Structure) is a document layout analysis dataset for historical documents in the late 19th and early 20th century. The dataset provides instance segmentation (bounding boxes and polygons/masks) annotations for 19 different classes with more then 52.000 instances.
Images with paired ground-truth caption hierarchies
The nordland used in SALAD and BoQ (2760 queries, 27592 reference images, threshold: 1 frames).
💡 Description A new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation (M$^3$-VOS), to verify the ability of models to understand object phases, which consists of 479 high-resolution videos spanning over 10 distinct everyday scenarios. We collected 205,181 masks, with an average track duration of 14.27s. M$^3$-VOS covers 120+ categories of objects across 6 phases within 14 scenarios, encompassing 23 specific phase transitions.
ImgEdit is a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks.
CUHK Face Sketch database (CUFS) is for research on face sketch synthesis and face sketch recognition. It includes 188 faces from the Chinese University of Hong Kong (CUHK) student database, 123 faces from the AR database 1, and 295 faces from the XM2VTS database 2. There are 606 faces in total. For each face, there is a sketch drawn by an artist based on a photo taken in a frontal pose, under normal lighting condition, and with a neutral expression.
Although promising results have been achieved in the areas of traffic-sign detection and classification, few works have provided simultaneous solutions to these two tasks for realistic real world images. We make two contributions to this problem. Firstly, we have created a large traffic-sign benchmark from 100000 Tencent Street View panoramas, going beyond previous benchmarks. We call this benchmark Tsinghua-Tencent 100K. It provides 100000 images containing 30000 traffic-sign instances. These images cover large variations in illuminance and weather conditions. Each traffic-sign in the benchmark is annotated with a class label, its bounding box and pixel mask. Secondly, we demonstrate how a robust end-to-end convolutional neural network (CNN) can simultaneously detect and classify traffic-signs. Most previous CNN image processing solutions target objects that occupy a large proportion of an image, and such networks do not work well for target objects occupying only a small fraction of
The UAVA,<i>UAV-Assistant</i>, dataset is specifically designed for fostering applications which consider UAVs and humans as cooperative agents. We employ a real-world 3D scanned dataset (<a href="https://niessner.github.io/Matterport/">Matterport3D</a>), physically-based rendering, a gamified simulator for realistic drone navigation trajectory collection, to generate realistic multimodal data both from the user’s exocentric view of the drone, as well as the drone’s egocentric view.
The AND Dataset contains 13700 handwritten samples and 15 corresponding expert examined features for each sample. The dataset is released for public use and the methods can be extended to provide explanations on other verification tasks like face verification and bio-medical comparison. This dataset can serve as the basis and benchmark for future research in explanation based handwriting verification.
The BIWI Walking Pedestrians dataset consists of walking pedestrians in busy scenarios from a birds eye view.
The Sheffield (previously UMIST) Face Database consists of 564 images of 20 individuals (mixed race/gender/appearance). Each individual is shown in a range of poses from profile to frontal views – each in a separate directory labelled 1a, 1b, … 1t and images are numbered consecutively as they were taken. The files are all in PGM format, approximately 220 x 220 pixels with 256-bit grey-scale.
The ICVL dataset is a hand pose estimation dataset that consists of 330K training frames and 2 testing sequences with each 800 frames. The dataset is collected from 10 different subjects with 16 hand joint annotations for each frame.
RobotPush is a dataset for object singulation – the task of separating cluttered objects through physical interaction. The dataset contains 3456 training images with labels and 1024 validation images with labels. It consists of simulated and real-world data collected from a PR2 robot that equipped with a Kinect 2 camera. The dataset also contains ground truth instance segmentation masks for 110 images in the test set.
The Specs on Faces (SoF) dataset, a collection of 42,592 (2,662×16) images for 112 persons (66 males and 46 females) who wear glasses under different illumination conditions. The dataset is FREE for reasonable academic fair use. The dataset presents a new challenge regarding face detection and recognition. It is focused on two challenges: harsh illumination environments and face occlusions, which highly affect face detection, recognition, and classification. The glasses are the common natural occlusion in all images of the dataset. However, there are two more synthetic occlusions (nose and mouth) added to each image. Moreover, three image filters, that may evade face detectors and facial recognition systems, were applied to each image. All generated images are categorized into three levels of difficulty (easy, medium, and hard). That enlarges the number of images to be 42,592 images (26,112 male images and 16,480 female images). There is metadata for each image that contains many infor
HDM05 is a MoCap (motion capture) dataset. It contains more than three hours of systematically recorded and well-documented motion capture data in the C3D as well as in the ASF/AMC data format. HDM05 contains almost 2337 sequences with 130 motion classes performed by 5 different actors.
The OpeReid dataset is a person re-identification dataset that consists of 7,413 images of 200 persons.
Country211 is a dataset released by OpenAI, designed to assess the geolocation capability of visual representations. It filters the YFCC100m dataset (Thomee et al., 2016) to find 211 countries (defined as having an ISO-3166 country code) that have at least 300 photos with GPS coordinates. OpenAI built a balanced dataset with 211 categories, by sampling 200 photos for training and 100 photos for testing, for each country.
The ORVS dataset has been newly established as a collaboration between the computer science and visual-science departments at the University of Calgary.
The OCTAGON dataset is a set of Angiography by Octical Coherence Tomography images (OCT-A) used to the segmentation of the Foveal Avascular Zone (FAZ). The dataset includes 144 healthy OCT-A images and 69 diabetic OCT-A images, divided into four groups, each one with 36 and about 17 OCT-A images, respectively. These groups are: 3x3 superficial, 3x3 deep, 6x6 superficial and 6x6 deep, where 3x3 and 6x6 are the zoom of the image and superficial/deep are the depth level of the extracted image. The healthy dataset includes OCT-A images from people classified in 6 age ranges: 10-19 years, 20-29 years, 30-39 years, 40-49 years, 50-59 years and 60-69 years. Each age range includes 3 different patients with information of left and right eyes for each one. Finally, for each eye, there are four different images: one 3x3 superficial image, one 3x3 deep image, one 6x6 superficial image and one 6x6 deep image. Each image have two manual labelled of expert clinicians of the FAZ and their quantificat