19,997 machine learning datasets
19,997 dataset results
https://paperswithcode.com/sota/semantic-segmentation-on-isprs-potsdam
The Udacity dataset is mainly composed of video frames taken from urban roads. It provides a total number of 404,916 video frames for training and 5,614 video frames for testing. This dataset is challenging due to severe lighting changes, sharp road curves and busy traffic.
The DBRD (pronounced dee-bird) dataset contains over 110k book reviews along with associated binary sentiment polarity labels. It is greatly influenced by the Large Movie Review Dataset and intended as a benchmark for sentiment classification in Dutch.
You need to request access to download and use the dataset.
Data for depression
Delicious : This data set contains tagged web pages retrieved from the website delicious.com.
The CID (Campus Image Dataset) is a dataset captured in low-light env with the help of Android programming. Its basic unit is group, which is named by capture time and contains 8 exposure-time-varying raw image shot in a burst.
4D Light Field Dataset is a light field benchmark consisting of 24 carefully designed synthetic, densely sampled 4D light fields with highly accurate disparity ground truth.
FixaTons is a large collection of datasets human scanpaths (temporally ordered sequences of fixations) and saliency maps.
ePillID is a benchmark for developing and evaluating computer vision models for pill identification. The ePillID benchmark is designed as a low-shot fine-grained benchmark, reflecting real-world challenges for developing image-based pill identification systems. The characteristics of the ePillID benchmark include: * Reference and consumer images: The reference images are taken with controlled lighting and backgrounds, and with professional equipment. The consumer images are taken with real-world settings including different lighting, backgrounds, and equipment. For most of the pills, one image per side (two images per pill type) is available from the NIH Pillbox dataset. * Low-shot and fine-grained setting: 13k images representing 9804 appearance classes (two sides for 4902 pill types). For most of the appearance classes, there exists only one reference image, making it a challenging low-shot recognition setting.
A dataset derived from the recently introduced Mimetics dataset.
The dataset contains two subsets of synthetic, semantically segmented road-scene images, which have been created for developing and applying the methodology described in the paper "A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View" (IEEE Xplore, arXiv, YouTube)
COVID19 Data from the World Health Organization
CholecT40 is the first endoscopic dataset introduced to enable research on fine-grained action recognition in laparoscopic surgery.
The dsd100 is a dataset of 100 full lengths of music tracks of different styles along with their isolated drums, bass, vocals, and other stems.
Language Identification Dataset
In this competition you will be identifying regions in satellite images that contain certain cloud formations, with label names: Fish, Flower, Gravel, Sugar. For each image in the test set, you must segment the regions of each cloud formation label. Each image has at least one cloud formation, and can possibly contain up to all all four.
The INRIA-Horse dataset consists of 170 horse images and 170 images without horses. All horses in all images are annotated with a bounding-box. The main challenges it offers are clutter, intra-class shape variability, and scale changes. The horses are mostly unoccluded, taken from approximately the side viewpoint, and face the same direction.