TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

SAF (Short Answer Feedback Dataset)

This dataset can be found on HuggingFace:

5 papers0 benchmarks

SYNS-Patches

SYNS-Patches dataset, which is a subset of SYNS. The original SYNS is composed of aligned image and LiDAR panoramas from 92 different scenes belonging to a wide variety of environments, such as Agriculture, Natural (e.g. forests and fields), Residential, Industrial and Indoor. It represents the subset of patches from each scene extracted at eye level at 20 degree intervals of a full horizontal rotation. This results in 18 images per scene and a total dataset size of 1656.

5 papers0 benchmarksImages, LiDAR

RuCoLA

The Russian Corpus of Linguistic Acceptability (RuCoLA) is built from the ground up under the well-established binary LA approach. RuCoLA consists of 9.8k in-domain sentences from linguistic publications and 3.6k out-of-domain sentence produced by generative models.

5 papers2 benchmarksTexts

ComFact

ComFact is a benchmark for commonsense fact linking, where models are given contexts and trained to identify situationally-relevant commonsense knowledge from KGs. The novel benchmark, C-om-Fact, contains ∼293k in-context relevance annotations for common-sense triplets across four stylistically diverse dialogue and storytelling datasets.

5 papers0 benchmarksTexts

RF100 (Roboflow 100)

The evaluation of object detection models is usually performed by optimizing a single metric, e.g. mAP, on a fixed set of datasets, e.g. Microsoft COCO and Pascal VOC. Due to image retrieval and annotation costs, these datasets consist largely of images found on the web and do not represent many real-life domains that are being modelled in practice, e.g. satellite, microscopic and gaming, making it difficult to assert the degree of generalization learned by the model.

5 papers1 benchmarksImages, Videos

FFHQ-UV

FFHQ-UV is a large-scale facial UV-texture dataset that contains over 50,000 high-quality texture UV-maps with even illuminations, neutral expressions, and cleaned facial regions, which are desired characteristics for rendering realistic 3D face models under different lighting conditions. The dataset is derived from FFHQ and preserves the most variations in FFHQ.

5 papers0 benchmarks3D

HOD (Hand-held Object Dataset)

HOD is a dataset for 3D object reconstruction which contains 35 objects, divided into two subsets named Sculptures and Daily Objects. The Sculptures has five human sculptures with complex geometries and pure white textures. The Daily Objects consists of 30 daily objects with various shapes and appearances. All of the Sculptures and nine of the Daily Objects are paired with high-fidelity scanned meshes as ground truth geometries for evaluation.

5 papers6 benchmarks3D

KiloGram

KiloGram is a resource for studying abstract visual reasoning in humans and machines. It contains a richly annotated dataset with >1k distinct stimuli.

5 papers0 benchmarksImages

SWINSEG (Singapore Whole sky Nighttime Image SEGmentation Database)

The SWINSEG dataset contains 115 nighttime images of sky/cloud patches along with their corresponding binary ground truth maps. The ground truth annotation was done in consultation with experts from Singapore Meteorological Services. All images were captured in Singapore using WAHRSIS, a calibrated ground-based whole sky imager, over a period of 12 months from January to December 2016. All image patches are 500x500 pixels in size, and were selected considering several factors such as time of the image capture, cloud coverage, and seasonal variations.

5 papers10 benchmarks

BRACE (The Breakdancing Competition Dataset for Dance Motion Synthesis)

BRACE is a dataset for audio-conditioned dance motion synthesis challenging common assumptions for this task:

5 papers30 benchmarksActions, Audio, Point cloud, Videos

ADVETA

ADVErsarial Table perturbAtion (ADVETA) is a robustness evaluation benchmark featuring natural and realistic ATPs. It is based on three mainstream Text-to-SQL datasets, Spider, WikiSQL and WTQ.

5 papers0 benchmarksTexts

Distress Analysis Interview Corpus/Wizard-of-Oz set (DAIC-WOZ)

The Distress Analysis Interview Corpus/Wizard-of-Oz set (DAIC-WOZ) dataset [24, 25] comprises voice and text samples from 189 interviewed healthy and control persons and their PHQ-8 depression detection questionnaire. This dataset is commonly used in research works for text-based detection, voice-based detection, and in multi-modal architecture

5 papers0 benchmarksAudio, Texts, Videos

ISLES 2017

A medical image segmentation challenge at the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2017. On the SMIR, you can register for the challenge, download the test data and submit your results. For more information, visit the official ISLES homepage under www.isles-challenge.org.

5 papers0 benchmarks

HarveyNER

fine-grained location names extraction from disaster-related tweets

5 papers1 benchmarks

HaDes

HaDes is a token-level, reference-free hallucination detection dataset named HAllucination DEtection dataSet. To create this dataset, a large number of text segments extracted from English language Wikipedia are perturbed, and then verified these with crowd-sourced annotations.

5 papers0 benchmarksTexts

AstroVision

AstroVision is a large-scale dataset comprised of 115,970 densely annotated, real images of 16 different small bodies from both legacy and ongoing deep space missions to facilitate the study of deep learning for autonomous navigation in the vicinity of a small body.

5 papers0 benchmarks

BB-norm-habitat (Bacteria Biotope - entity normalization - bacterial habitat)

In the BB-norm modality of this task, participant systems had to normalize textual entity mentions according to the OntoBiotope ontology for habitats. See BB-dataset for more information.

5 papers0 benchmarksBiology, Texts

BB-norm-phenotype (Bacteria Biotope - entity normalization - phenotype)

In the BB-norm modality of this task, participant systems had to normalize textual entity mentions according to the OntoBiotope ontology for phenotypes. See BB-dataset for more information.

5 papers0 benchmarksBiology, Texts

Sales (Rossmann Store Sales)

Forecast Sales using ARIMA and SARIMA

5 papers0 benchmarksTime series

PIMA Diabetes Dataset with Paper, Experiments, and Code

Please refer to the following paper which includes a description of the dataset and a link to the dataset and the paper code:

5 papers0 benchmarks
PreviousPage 223 of 1000Next