3,275 machine learning datasets
3,275 dataset results
This dataset is an extremely challenging set of over 5000+ original Electronic Items images captured and crowdsourced from over 1000+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs.
This dataset is an extremely challenging set of over 5000+ original Hindi text images captured and crowdsourced from over 700+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at DataclusterLabs.
This dataset were acquired with the Airphen (Hyphen, Avignon, France) six-band multi-spectral camera configured using the 450/570/675/710/730/850 nm bands with a 10 nm FWHM. And acquired on the site of INRAe in Montoldre (Allier, France, at 46°20'30.3"N 3°26'03.6"E) within the framework of the “RoSE challenge” founded by the French National Research Agency (ANR). Images contains bean, with various natural weeds (yarrows, amaranth, geranium, plantago, etc) and sowed ones (mustards, goosefoots, mayweed and ryegrass) with very distinct characteristics in terms of illumination (shadow, morning, evening, full sun, cloudy, rain, ...) The ground truth is defined for each images with polygons around leafs boundaries: In addition, each polygons are labeled into crop or weed. (2020-06-11)
Simulacra Aesthetic Captions is a dataset of over 238000 synthetic images generated with AI models such as CompVis latent GLIDE and Stable Diffusion from over forty thousand user submitted prompts. The images are rated on their aesthetic value from 1 to 10 by users to create caption, image, and rating triplets. In addition to this each user agreed to release all of their work with the bot: prompts, outputs, ratings, completely public domain under the CC0 1.0 Universal Public Domain Dedication. The result is a high quality royalty free dataset with over 176000 ratings that can be used for projects such as:
AiTLAS: Benchmark Arena is an open-source benchmark framework for evaluating state-of-the-art deep learning approaches for image classification in Earth Observation (EO).
Description K-pop Idol Dataset - Female (KID-F) is the first dataset of K-pop idol high quality face images. It consists of about 6,000 high quality face images at 512x512 resolution and identity labels for each image.
1593 handwritten digits from around 80 persons were scanned, stretched in a rectangular box 16x16 in a gray scale of 256 values.
Infinity AI's Spills Basic Dataset is a synthetic, open-source dataset for safety applications. It features 150 videos of photorealistic liquid spills across 15 common settings. Spills take on in-context reflections, caustics, and depth based on the surrounding environment, lighting, and floor. Each video contains a spill of unique properties (size, color, profile, and more) and is accompanied by pixel-perfect labels and annotations. This dataset can be used to develop computer vision algorithms to detect the location and type of spill from the perspective of a fixed camera.
The penguin dataset is a collection of images of penguin colonies in Antarctica coming from the larger penguin watch project, which was setup with the purpose of monitoring their changes in population. The images are taken by fixed cameras in over 40 different locations, which have been capturing an image per hour for several years. In order to track the colony sizes, the number of penguins in each of the images in the dataset is required. So far, the penguin count has been done with the help of citizen scientists on the Penguin Watch site by Zooniverse, where interested users can place dots on top of the penguins. Here we release part of this data to the vision community in order to learn from the crowd-sourced dot-annotations to automatically annotate these images.
Face detection and subsequent localization of facial landmarks are the primary steps in many face applications. Numerous algorithms and benchmark datasets have been introduced to develop robust models for the visible domain. However, varying conditions of illumination still pose challenging problems. In this regard, thermal cameras are employed to address this problem, because they operate on longer wavelengths. However, thermal face and facial landmark detection in the wild is an open research problem because most of the existing thermal datasets were collected in controlled environments. In addition, many of them were not annotated with face bounding boxes and facial landmarks. In this work, we present a thermal face dataset with manually labeled bounding boxes and facial landmarks to address these problems. The dataset contains 9,982 images of 147 subjects collected under controlled and uncontrolled conditions. As a baseline, we trained the YOLOv5 object detection model and its adap
Facial landmark detection is a cornerstone in many facial analysis tasks such as face recognition, drowsiness detection, and facial expression recognition. Numerous methodologies were introduced to achieve accurate and efficient facial landmark localization in visual images. However, there are only several works that address facial landmark detection in thermal images. The main challenge is the limited number of annotated datasets. In this work, we present a thermal face dataset with annotated face bounding boxes and facial landmarks. The dataset contains 2,556 thermal images of 142 individuals, where each thermal image is paired with the corresponding visual image. To the best of our knowledge, our dataset is the largest in terms of the number of individuals. In addition, our dataset can be employed for tasks such as thermal-to-visual image translation, thermal-visual face recognition, and others. We trained two models for the facial landmark detection task to show the efficacy of our
RTI International (RTI) generated 2,611 labeled point locations representing 19 different land cover types, clustered in 5 distinct agroecological zones within Rwanda. These land cover types were reduced to three crop types (Banana, Maize, and Legume), two additional non-crop land cover types (Forest and Structure), and a catch-all Other land cover type to provide training/evaluation data for a crop classification model. Each point is attributed with its latitude and longitude, the land cover type, and the degree of confidence the labeler had when classifying the point location. For each location there are also three corresponding image chips (4.5 m x 4.5 m in size) with the point id as part of the image name. Each image contains a P1, P2, or P3 designation in the name, indicating the time period. P1 corresponds to December 2018, P2 corresponds to January 2019, and P3 corresponds to February 2019. These data were used in the development of research documented in greater detail in “Deep
CVGL Camera Calibration Dataset consists of 49 camera configurations with town 1 having 25 configurations while town 2 having 24 configurations. The parameters modified for generating the configurations include fov, x, y, z, pitch, yaw, and roll. Here, fov is the field of view, (x, y, z) is the translation while (pitch, yaw, and roll) is the rotation between the cameras. The total number of image pairs is 79, 320, out of which 18, 083 belong to Town 1 while 61, 237 belong to Town 2, the difference in the number of images is due to the length of the tracks.
JigsawPlan contains room layouts and floorplans for 98,780 single-story houses/apartments from a production pipeline, designed for the Extreme Structure from Motion (E-SfM) problem.
MapAI: Precision in Building Segmentation Dataset The dataset comprises 7500 training images and 1500 validation images from Denmark. The test dataset is split into two tasks, where the first task (1368 images) is to segment the buildings only using aerial images. In contrast, the second task (978 images) allows using aerial images and lidar data. All data samples have a resolution of 500x500. The aerial images are RGB images, while the lidar data are rasterized. The ground truth masks have two classes, building, and background.
We present the Gracenote Multi-Crop (GNMC) dataset, to further research in algorithms for aesthetic image cropping. The dataset consists of a diverse collection of 10K images, each cropped in five different aspect ratios by experienced editors. GNMC is larger than existing datasets commonly used to benchmark image cropping approaches such as FCDB (1743 images) and FLMS (500 images). This dataset can enable aesthetic cropping algorithms as described in "An Experience-Based Direct Generation Approach to Automatic Image Cropping" by Christensen and Vartakavi.
A fundamental component of human vision is our ability to parse complex visual scenes and judge the relations between their constituent objects. AI benchmarks for visual reasoning have driven rapid progress in recent years with state-of-the-art systems now reaching human accuracy on some of these benchmarks. Yet, there remains a major gap between humans and AI systems in terms of the sample efficiency with which they learn new visual reasoning tasks. Humans' remarkable efficiency at learning has been at least partially attributed to their ability to harness compositionality -- allowing them to efficiently take advantage of previously gained knowledge when learning new tasks. Here, we introduce a novel visual reasoning benchmark, Compositional Visual Relations (CVR), to drive progress towards the development of more data-efficient learning algorithms. We take inspiration from fluidic intelligence and non-verbal reasoning tests and describe a novel method for creating compositions of abs
We introduce the KAIST multi-spectral dataset, which covers a greater range of drivable regions, from urban to residential, for autonomous systems. Our dataset provides different perspectives of the world captured in coarse time slots (day and night) in addition to fine time slots (sunrise, morning, afternoon, sunset, night and dawn). For all-day perception of autonomous systems, we propose the use of a different spectral sensor, i.e., a thermal imaging camera. Toward this goal, we develop a multi-sensor platform, which supports the use of a co-aligned RGB/Thermal camera, RGB stereo, 3D LiDAR and inertial sensors (GPS/IMU) and a related calibration technique. We design a wide range of visual perception tasks including the object detection, drivable region detection, localization, image enhancement, depth estimation and colorization using a single/multi-spectral approach. In this paper, we provide a description of our benchmark with the recording platform, data format, development toolk
This is an image dataset for object detection of wildlife in the mixed coniferous broad-leaved forest.
SESYD "Systems Evaluation SYnthetic Documents" is a database of synthetical documents with groundtruth. This database targets two main research problems in the document image analysis field (i) symbol recognition and spotting in line drawing images (floorplans and electrical diagrams) (ii) character segmentation and recognition in geographical maps. The database is composed of eleven collections for performance evaluation containing 284k images, 190k symbols and 284k characters (k for thousand). SESYD is today a key database in the document image analysis field published in 2010 and referred by one hundred of citations into research papers.