3,275 machine learning datasets
3,275 dataset results
Twitter-MEL is a multimodal entity linking (MEL) dataset built from Twitter. The dataset consists of tweets that had both text and images, with a total of 2.6M timeline tweets and 20k entities.
LoLi-Phone is a large-scale low-light image and video dataset for Low-light image enhancement (LLIE). The images and videos are taken by different mobile phones' cameras under diverse illumination conditions.
The Kvasir-SEG dataset includes 196 polyps smaller than 10 mm classified as Paris class 1 sessile or Paris class IIa. We have selected it with the help of expert gastroenterologists. We have released this dataset separately as a subset of Kvasir-SEG. We call this subset Kvasir-Sessile.
LabPics Chemistry Dataset
CURE-OR is a large-scale, controlled, and multi-platform object recognition dataset denoted as Challenging Unreal and Real Environments for Object Recognition. In this dataset, there are 1,000,000 images of 100 objects with varying size, color, and texture that are positioned in five different orientations and captured using five devices including a webcam, a DSLR, and three smartphone cameras in real-world (real) and studio (unreal) environments. The controlled challenging conditions include underexposure, overexposure, blur, contrast, dirty lens, image noise, resizing, and loss of color information.
GOO (Gaze-on-Objects) is a dataset for gaze object prediction, where the goal is to predict a bounding box for a person's gazed-at object. GOO is composed of a large set of synthetic images (GOO Synth) supplemented by a smaller subset of real images (GOO-Real) of people looking at objects in a retail environment.
The NTIRE 2021 HDR was built for the first challenge on high-dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2021. The challenge aims at estimating a HDR image from one or multiple respective low-dynamic range (LDR) observations, which might suffer from under- or over-exposed regions and different sources of noise. The challenge is composed by two tracks: In Track 1 only a single LDR image is provided as input, whereas in Track 2 three differently-exposed LDR images with inter-frame motion are available. In both tracks, the ultimate goal is to achieve the best objective HDR reconstruction in terms of PSNR with respect to a ground-truth image, evaluated both directly and with a canonical tone mapping operation.
The complete blood count (CBC) dataset contains 360 blood smear images along with their annotation files splitting into Training, Testing, and Validation sets. The training folder contains 300 images with annotations. The testing and validation folder both contain 60 images with annotations. We have done some modifications over the original dataset to prepare this CBC dataset where some of the image annotation files contain very low red blood cells (RBCs) than actual and one annotation file does not include any RBC at all although the cell smear image contains RBCs. So, we clear up all the fallacious files and split the dataset into three parts. Among the 360 smear images, 300 blood cell images with annotations are used as the training set first, and then the rest of the 60 images with annotations are used as the testing set. Due to the shortage of data, a subset of the training set is used to prepare the validation set which contains 60 images with annotations.
ZeroWaste is a dataset for automatic waste detection and segmentation. This dataset contains over 1,800 fully segmented video frames collected from a real waste sorting plant along with waste material labels for training and evaluation of the segmentation methods, as well as over 6,000 unlabeled frames that can be further used for semi-supervised and self-supervised learning techniques. ZeroWaste also provides frames of the conveyor belt before and after the sorting process, comprising a novel setup that can be used for weakly-supervised segmentation.
Games dataset containing 100,000 Gameplay Images of 175 Video Games across 10 Sports Genres - AMERICAN FOOTBALL, BASKETBALL, BIKE RACING, CAR RACING, FIGHTING, HOCKEY, SOCCER, TABLE TENNIS, TENNIS.
VinDr-RibCXR is a benchmark dataset for automatic segmentation and labeling of individual ribs from chest X-ray (CXR) scans. The VinDr-RibCXR contains 245 CXRs with corresponding ground truth annotations provided by human experts.
mTVR is a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21.8K TV show video clips. The dataset is collected by extending the popular TVR dataset (in English) with paired Chinese queries and subtitles. Compared to existing moment retrieval datasets, mTVR is multilingual, larger, and comes with diverse annotations.
MEDIC is a large social media image classification dataset for humanitarian response consisting of 71,198 images to address four different tasks in a multi-task learning setup. It consists data from several data sources such as CrisisMMD, data from AIDR and Damage Multimodal Dataset (DMD).
ROF is a dataset for occluded face recognition that contains faces with both upper face occlusion, due to sunglasses, and lower face occlusion, due to masks.
SpaceNet 1: Building Detection v1 is a dataset for building footprint detection. The data is comprised of 382,534 building footprints, covering an area of 2,544 sq. km of 3/8 band WorldView-2 imagery (0.5 m pixel res.) across the city of Rio de Janeiro, Brazil. The images are processed as 200m×200m tiles with associated building footprint vectors for training.
MSMT17-C is an evaluation set that consists of algorithmically generated corruptions applied to the MSMT17 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
7,672 human written natural language navigation instructions for routes in OpenStreetMap with a focus on visual landmarks. Validated in Street View.
IDDA is a large scale, synthetic dataset for semantic segmentation with more than 100 different source visual domains. The dataset has been created to explicitly address the challenges of domain shift between training and test data in various weather and view point conditions, in seven different city types.
SyRIP is a hybrid synthetic and real infant pose (SyRIP) dataset with small yet diverse real infant images as well as generated synthetic infant poses and (2) a multi-stage invariant representation learning strategy that could transfer the knowledge from the adjacent domains of adult poses and synthetic infant images into our fine-tuned domain-adapted infant pose
This basketball dataset was acquired under the Walloon region project DeepSport, using the Keemotion system installed in multiple arenas. We would like to thanks both Keemotion for letting us use their system for raw image acquisition during live productions, and the LNB for the rights on their images.