TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

ARQMath

The goal of ARQMath is to advance techniques for mathematical information retrieval, in particular, retrieving answers to mathematical questions (Task 1), and formula retrieval (Task 2). Using the question posts from Math Stack Exchange, participating systems are given a question or a formula from a question and asked to return a ranked list of either potential answers to the question or potentially useful formulae (in the case of a formula query). Relevance is determined by the expected utility of each returned item. These tasks allow participating teams to explore leveraging math notation together with text to improve the quality of retrieval results.

2 papers4 benchmarks

CUHK-SYSU-TBPS

CUHK-SYSU-TBPS is a dataset for text-based person search task.

2 papers0 benchmarks

IACC.3 (Internet Archive videos (IACC.3) under Creative Commons licenses.)

The IACC.3 dataset is approximately 4600 Internet Archive videos (144 GB, 600 h) with Creative Commons licenses in MPEG-4/H.264 format with duration ranging from 6.5 min to 9.5 min and a mean duration of almost 7.8 min. Most videos will have some metadata provided by the donor available e.g., title, keywords, and description.

2 papers0 benchmarksVideos

ATLANTIS

ATLANTIS is a benchmark for semantic segmentation of waterbody images. This dataset covers a wide range of natural waterbodies such as sea, lake, river and man-made (artificial) water-related structures such as dam, reservoir, canal, and pier. ATLANTIS includes 5,195 pixel-wise annotated images split to 3,364 training, 535 validation, and 1,296 testing images. In addition to 35 waterbodies, this dataset covers 21 general labels such as person, car, road and building.

2 papers8 benchmarks

AWS Documentation

We present the AWS documentation corpus, an open-book QA dataset, which contains 25,175 documents along with 100 matched questions and answers. These questions are inspired by the author's interactions with real AWS customers and the questions they asked about AWS services. The data was anonymized and aggregated. All questions in the dataset have a valid, factual and unambiguous answer within the accompanying documents, we deliberately avoided questions that are ambiguous, incomprehensible, opinion-seeking, or not clearly a request for factual information. All questions, answers and accompanying documents in the dataset are annotated by authors. There are two types of answers: text and yes-no-none(YNN) answers. Text answers range from a few words to a full paragraph sourced from a continuous block of words in a document or from different locations within the same document. Every question in the dataset has a matched text answer. Yes-no-none(YNN) answers can be yes, no, or none dependin

2 papers0 benchmarksTexts

InstaOrder

InstaOrder can be used to understand the geometrical relationships of instances in an image. The dataset consists of 2.9M annotations of geometric orderings for class-labeled instances in 101K natural scenes. The scenes were annotated by 3,659 crowd-workers regarding (1) occlusion order that identifies occluder/occludee and (2) depth order that describes ordinal relations that consider relative distance from the camera.

2 papers0 benchmarks

Light Snowfall (DENSE)

We introduce an object detection dataset in challenging adverse weather conditions covering 12000 samples in real-world driving scenes and 1500 samples in controlled weather conditions within a fog chamber. The dataset includes different weather conditions like fog, snow, and rain and was acquired by over 10,000 km of driving in northern Europe. The driven route with cities along the road is shown on the right. In total, 100k Objekts were labeled with accurate 2D and 3D bounding boxes. The main contributions of this dataset are: - We provide a proving ground for a broad range of algorithms covering signal enhancement, domain adaptation, object detection, or multi-modal sensor fusion, focusing on the learning of robust redundancies between sensors, especially if they fail asymmetrically in different weather conditions. - The dataset was created with the initial intention to showcase methods, which learn of robust redundancies between the sensor and enable a raw data sensor fusion in cas

2 papers6 benchmarksLiDAR

DGTA-VisDrone (DeepGTAV-VisDrone)

Object Detection data set created from the engine DeepGTAV, which is based on the video game GTAV. Part of the three data sets proposed in the paper. This data set is motivated from the VisDrone data set with almost the same classes.

2 papers0 benchmarksImages

DGTA-SeaDronesSee (DeepGTAV-SeaDronesSee)

Object Detection data set created from the engine DeepGTAV, which is based on the video game GTAV. Part of the three data sets proposed in the paper. This data set is motivated from the SeaDronesSee dataset with almost the same classes.

2 papers0 benchmarksImages

VFITex

To test interpolation performance on various texture types, we developed a new test set, VFITex, which contains twenty 100-frame UHD or HD videos at 24, 30 or 50 FPS, collected from the Xiph, Mitch Martinez Free 4K Stock Footage, UVG database and pexels.com. This dataset covers diverse textured scenes, including crowds, flags, foliage, animals, water, leaves, fire and smoke. HD patches were center-cropped from the UHD sequences, preserving the original UHD characteristics. All frames in each sequence were used for evaluation, totaling 940 quintuplets.

2 papers2 benchmarks

DoodleUINet

Doodle to UI Dataset contains 11 thousand drawings from 16 categories.

2 papers0 benchmarks

Cellcycle Funcat

Hierarchical multi-label classification dataset for functional genomics

2 papers1 benchmarks

Derisi Funcat

Hierarchical-multilabel classification dataset for functional genomics

2 papers1 benchmarks

Eisen Funcat

Hierarchical-multilabel classification dataset for functional genomics

2 papers1 benchmarks

Expr Funcat

Hierarchical-multilabel classification dataset for functional genomics

2 papers1 benchmarks

Gasch1 Funcat

Hierarchical-multilabel classification dataset for functional genomics

2 papers1 benchmarks

Gasch2 Funcat

Hierarchical-multilabel classification dataset for functional genomics

2 papers1 benchmarks

Seq Funcat

Hierarchical-multilabel classification dataset for functional genomics

2 papers1 benchmarks

Spo Funcat

Hierarchical-multilabel classification dataset for functional genomics

2 papers1 benchmarks

EVICAN

Deep learning use for quantitative image analysis is exponentially increasing. However, training accurate, widely deployable deep learning algorithms requires a plethora of annotated (ground truth) data. Image collections must contain not only thousands of images to provide sufficient example objects (i.e. cells), but also contain an adequate degree of image heterogeneity. We present a new dataset, EVICAN-Expert visual cell annotation, comprising partially annotated grayscale images of 30 different cell lines from multiple microscopes, contrast mechanisms and magnifications that is readily usable as training data for computer vision applications. With 4600 images and ∼26 000 segmented cells, our collection offers an unparalleled heterogeneous training dataset for cell biology deep learning application development. The dataset is freely available (https://edmond.mpdl.mpg.de/imeji/collection/l45s16atmi6Aa4sI?q=).

2 papers4 benchmarks
PreviousPage 323 of 1000Next