Datasets

19,997 machine learning datasets

19,997 dataset results

ARQMath

The goal of ARQMath is to advance techniques for mathematical information retrieval, in particular, retrieving answers to mathematical questions (Task 1), and formula retrieval (Task 2). Using the question posts from Math Stack Exchange, participating systems are given a question or a formula from a question and asked to return a ranked list of either potential answers to the question or potentially useful formulae (in the case of a formula query). Relevance is determined by the expected utility of each returned item. These tasks allow participating teams to explore leveraging math notation together with text to improve the quality of retrieval results.

2 papers4 benchmarks

CUHK-SYSU-TBPS

CUHK-SYSU-TBPS is a dataset for text-based person search task.

2 papers0 benchmarks

IACC.3 (Internet Archive videos (IACC.3) under Creative Commons licenses.)

The IACC.3 dataset is approximately 4600 Internet Archive videos (144 GB, 600 h) with Creative Commons licenses in MPEG-4/H.264 format with duration ranging from 6.5 min to 9.5 min and a mean duration of almost 7.8 min. Most videos will have some metadata provided by the donor available e.g., title, keywords, and description.

2 papers0 benchmarksVideos

ATLANTIS

ATLANTIS is a benchmark for semantic segmentation of waterbody images. This dataset covers a wide range of natural waterbodies such as sea, lake, river and man-made (artificial) water-related structures such as dam, reservoir, canal, and pier. ATLANTIS includes 5,195 pixel-wise annotated images split to 3,364 training, 535 validation, and 1,296 testing images. In addition to 35 waterbodies, this dataset covers 21 general labels such as person, car, road and building.

2 papers8 benchmarks

AWS Documentation

We present the AWS documentation corpus, an open-book QA dataset, which contains 25,175 documents along with 100 matched questions and answers. These questions are inspired by the author's interactions with real AWS customers and the questions they asked about AWS services. The data was anonymized and aggregated. All questions in the dataset have a valid, factual and unambiguous answer within the accompanying documents, we deliberately avoided questions that are ambiguous, incomprehensible, opinion-seeking, or not clearly a request for factual information. All questions, answers and accompanying documents in the dataset are annotated by authors. There are two types of answers: text and yes-no-none(YNN) answers. Text answers range from a few words to a full paragraph sourced from a continuous block of words in a document or from different locations within the same document. Every question in the dataset has a matched text answer. Yes-no-none(YNN) answers can be yes, no, or none dependin

2 papers0 benchmarksTexts

InstaOrder

InstaOrder can be used to understand the geometrical relationships of instances in an image. The dataset consists of 2.9M annotations of geometric orderings for class-labeled instances in 101K natural scenes. The scenes were annotated by 3,659 crowd-workers regarding (1) occlusion order that identifies occluder/occludee and (2) depth order that describes ordinal relations that consider relative distance from the camera.

2 papers0 benchmarks

Light Snowfall (DENSE)

We introduce an object detection dataset in challenging adverse weather conditions covering 12000 samples in real-world driving scenes and 1500 samples in controlled weather conditions within a fog chamber. The dataset includes different weather conditions like fog, snow, and rain and was acquired by over 10,000 km of driving in northern Europe. The driven route with cities along the road is shown on the right. In total, 100k Objekts were labeled with accurate 2D and 3D bounding boxes. The main contributions of this dataset are: - We provide a proving ground for a broad range of algorithms covering signal enhancement, domain adaptation, object detection, or multi-modal sensor fusion, focusing on the learning of robust redundancies between sensors, especially if they fail asymmetrically in different weather conditions. - The dataset was created with the initial intention to showcase methods, which learn of robust redundancies between the sensor and enable a raw data sensor fusion in cas

2 papers6 benchmarksLiDAR

DGTA-VisDrone (DeepGTAV-VisDrone)

Object Detection data set created from the engine DeepGTAV, which is based on the video game GTAV. Part of the three data sets proposed in the paper. This data set is motivated from the VisDrone data set with almost the same classes.

2 papers0 benchmarksImages

DGTA-SeaDronesSee (DeepGTAV-SeaDronesSee)

2 papers0 benchmarksImages

VFITex

To test interpolation performance on various texture types, we developed a new test set, VFITex, which contains twenty 100-frame UHD or HD videos at 24, 30 or 50 FPS, collected from the Xiph, Mitch Martinez Free 4K Stock Footage, UVG database and pexels.com. This dataset covers diverse textured scenes, including crowds, flags, foliage, animals, water, leaves, fire and smoke. HD patches were center-cropped from the UHD sequences, preserving the original UHD characteristics. All frames in each sequence were used for evaluation, totaling 940 quintuplets.

2 papers2 benchmarks

DoodleUINet

Doodle to UI Dataset contains 11 thousand drawings from 16 categories.

2 papers0 benchmarks