19,997 machine learning datasets
19,997 dataset results
CurveLanes is a new benchmark lane detection dataset with 150K lanes images for difficult scenarios such as curves and multi-lanes in traffic lane detection. It is collected in real urban and highway scenarios in multiple cities in China. It is the largest lane detection dataset so far and establishes a more challenging benchmark for the community.
We introduce a new dataset, called FoCus, that supports knowledge-grounded answers that reflect user’s persona. One of the situations in which people need different types of knowledge, based on their preferences, occurs when they travel around the world.
Dinstinctions-646 are composed of 646 foreground images with manually annotated alpha mattes
Argoverse-HD is a dataset built for streaming object detection, which encompasses real-time object detection, video object detection, tracking, and short-term forecasting. It contains the video data from Argoverse 1.1 with our own MS COCO-style bounding box annotations with track IDs. The annotations are backward-compatible with COCO as one can directly evaluate COCO pre-trained models on this dataset to estimate the efficiency or the cross-dataset generalization capability of the models. The dataset contains high-quality and temporally-dense annotations for high-resolution videos (1920 x 1200 @ 30 FPS). Overall, there are 70,000 image frames and 1.3 million bounding boxes.
The evaluation of human epidermal growth factor receptor 2 (HER2) expression is essential to formulate a precise treatment for breast cancer. The routine evaluation of HER2 is conducted with immunohistochemical techniques (IHC), which is very expensive. Therefore, we propose a breast cancer immunohistochemical (BCI) benchmark attempting to synthesize IHC data directly with the paired hematoxylin and eosin (HE) stained images. The dataset contains 4870 registered image pairs, covering a variety of HER2 expression levels (0, 1+, 2+, 3+).
ScribbleKITTI is a scribble-annotated dataset for LiDAR semantic segmentation.
The Sony-TAu Realistic Spatial Soundscapes 2022(STARSS22) dataset consists of recordings of real scenes captured with high channel-count spherical microphone array (SMA). The recordings are conducted from two different teams at two different sites, Tampere University in Tammere, Finland, and Sony facilities in Tokyo, Japan. Recordings at both sites share the same capturing and annotation process, and a similar organization. They are organized in sessions, corresponding to distinct rooms, human participants, and sound making props with a few exceptions.
A dataset with fully annotated attention targets in video for attention target estimation.
QAMPARI is an ODQA benchmark, where question answers are lists of entities, spread across many paragraphs. It was created by (a) generating questions with multiple answers from Wikipedia's knowledge graph and tables, (b) automatically pairing answers with supporting evidence in Wikipedia paragraphs, and (c) manually paraphrasing questions and validating each answer.
Node classification on Film with 60%/20%/20% random splits for training/validation/test.
Node classification on Squirrel with 60%/20%/20% random splits for training/validation/test.
Node classification on Deezer Europe with 50%/25%/25% random splits for training/validation/test.
Node classification on Squirrel with the fixed 48%/32%/20% splits provided by Geom-GCN.
LVOS is a dataset for long-term video object segmentation (VOS). It consists of 220 videos with a total duration of 421 minutes. The videos in our LVOS last 1.59 minutes on average, which is 20 times longer than videos in existing VOS datasets. Each video includes various attributes, especially challenges deriving from the wild, such as long-term reappearing and cross-temporal similar objects.
DELIVER is an arbitrary-modal segmentation benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. Aside from this, the dataset is also used in four severe weather conditions as well as five sensor failure cases to exploit modal complementarity and resolve partial outages. It is designed for the tasks of arbitrary-modal semantic segmentation.
Multimodal material segmentation (MCubeS) dataset contains 500 sets of images from 42 street scenes. Each scene has images for four modalities: RGB, angle of linear polarization (AoLP), degree of linear polarization (DoLP), and near-infrared (NIR). The dataset provides annotated ground truth labels for both material and semantic segmentation for every pixel. The dataset is divided training set with 302 image sets, validation set with 96 image sets, and test set with 102 image sets. Each image has 1224 x 1024 pixels and a total of 20 class labels per pixel.
Hi4D contains 4D textured scans of 20 subject pairs, 100 sequences, and a total of more than 11K frames. Hi4D contains rich interaction centric annotations in 2D and 3D alongside accurately registered parametric body models.
The Spiking Heidelberg Digits (SHD) dataset is an audio-based classification dataset of 1k spoken digits ranging from zero to nine in the English and German languages. The audio waveforms have been converted into spike trains using an artificial model of the inner ear and parts of the ascending auditory pathway. The SHD dataset has 8,156 training and 2,264 test samples. A full description of the dataset and how it was created can be found in the paper below. Please cite this paper if you make use of the dataset.
The PKLot dataset contains 12,417 images of parking lots and 695,899 images of parking spaces segmented from them, which were manually checked and labeled. All images were acquired at the parking lots of the Federal University of Parana (UFPR) and the Pontificial Catholic University of Parana (PUCPR), both located in Curitiba, Brazil.
Person re-identification (Reid) is now an active research topic for AI-based video surveillance applications such as specific person search, but the practical issue that the target person(s) may change clothes (clothes inconsistency problem) has been overlooked for long. For the first time, this paper systematically studies this problem. We first overcome the difficulty of lack of suitable dataset, by collecting a small yet representative real dataset for testing whilst building a large realistic synthetic dataset for training and deeper studies. Facilitated by our new datasets, we are able to conduct various interesting new experiments for studying the influence of clothes inconsistency. We find that changing clothes makes Reid a much harder problem in the sense of bringing difficulties to learning effective representations and also challenges the generalization ability of previous Reid models to identify persons with unseen (new) clothes. Representative existing Reid models are adopt