TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

MetaLWOz (Meta-Learning Wizard-of-Oz)

Collected by leveraging background knowledge from a larger, more highly represented dialogue source.

4 papers0 benchmarks

MJU-Waste

MJU-Waste is an RGBD waste object segmentation dataset that is made public to facilitate future research in this area.

4 papers5 benchmarksImages

MLMA Hate Speech

A new multilingual multi-aspect hate speech analysis dataset and use it to test the current state-of-the-art multilingual multitask learning approaches.

4 papers0 benchmarks

ParaPat (Parallel Corpus of Patents Abstracts)

A parallel corpus from the open access Google Patents dataset in 74 language pairs, comprising more than 68 million sentences and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

4 papers0 benchmarks

Perlex

Persian dataset for relation extraction, which is an expert-translated version of the "Semeval-2010-Task-8" dataset.

4 papers0 benchmarks

pn-summary

Pn-summary is a dataset for Persian abstractive text summarization.

4 papers0 benchmarksTexts

Quda

Aims to help V-NLIs recognize analytic tasks from free-form natural language by training and evaluating cutting-edge multi-label classification models. The dataset contains diverse user queries, and each is annotated with one or multiple analytic tasks.

4 papers0 benchmarksTexts

Query-Focused Video Summarization Dataset

Collects dense per-video-shot concept annotations.

4 papers2 benchmarksTexts, Videos

RED (Real Embodied Dataset)

The Real Embodied Dataset (RED) is a computer vision large-scale dataset for grasping in cluttered scenes. It contains complete segmentation masks for partially occluded objects, with their order of occlusion.

4 papers0 benchmarksImages

redwood-3dscan

A dataset of more than ten thousand 3D scans of real objects.

4 papers0 benchmarks

DIML/CVl RGB-D Dataset

This dataset contains synchronized RGB-D frames from both Kinect v2 and Zed stereo camera. For the outdoor scene, the authors first generate disparity maps using an accurate stereo matching method and convert them using calibration parameters. A per-pixel confidence map of disparity is also provided. The scenes are captured at various places, e.g., offices, rooms, dormitory, exhibition center, street, road etc., from Yonsei University and Ewha University.

4 papers0 benchmarks

RobustPointSet

A dataset for robustness analysis of point cloud classification models (independent of data augmentation) to input transformations.

4 papers0 benchmarks

RONEC (Romanian Named Entity Corpus)

Romanian Named Entity Corpus is a named entity corpus for the Romanian language. The corpus contains over 26000 entities in ~5000 annotated sentences, belonging to 16 distinct classes. The sentences have been extracted from a copy-right free newspaper, covering several styles. This corpus represents the first initiative in the Romanian language space specifically targeted for named entity recognition.

4 papers0 benchmarksTexts

S2TLD (SJTU Small Traffic Light Dataset)

S2TLD is a traffic light dataset, which contains 5,786 images of approximately 1,080 * 1,920 pixels and 720 * 1,280 pixels. It also contains 5 categories (include red, yellow, green, off and wait on) of 1,4130 instances. The scenes cover a decent variety of road scenes and typical: * Busy street scenes inner-city, * Dense stop-and-go traffic * Strong changes in illumination/exposure * Flickering/Fluctuating traffic lights * Multiple visible traffic lights * Image parts that can be confused with traffic lights (e.g. large round tail lights)

4 papers0 benchmarksImages

SCDB (Simple Concept DataBase)

Includes annotations for 10 distinguishable concepts.

4 papers0 benchmarks

Sentiment140

Sentiment140 is a dataset that allows you to discover the sentiment of a brand, product, or topic on Twitter.

4 papers1 benchmarks

ShapenetRender

ShapenetRenderer is an extension of the ShapeNet Core dataset which has more variation in camera angles. For each mesh model, the dataset provides 36 views with smaller variation and 36 views with larger variation. The resolution of the newly rendered images is 224x224 in contrast to the 137x137 original resolution. Additionally, each RGB image is paired with a depth image, a normal map and an albedo image.

4 papers0 benchmarksImages

Some Like it Hoax

Some Like it Hoax is a fake news detection dataset consisting of 15,500 Facebook posts and 909,236 users.

4 papers0 benchmarks

SONYC-UST-V2

A dataset for urban sound tagging with spatiotemporal information. This dataset is aimed for the development and evaluation of machine listening systems for real-world urban noise monitoring. While datasets of urban recordings are available, this dataset provides the opportunity to investigate how spatiotemporal metadata can aid in the prediction of urban sound tags. SONYC-UST-V2 consists of 18510 audio recordings from the "Sounds of New York City" (SONYC) acoustic sensor network, including the timestamp of audio acquisition and location of the sensor.

4 papers0 benchmarksAudio

Swiss3DCities

Swiss3DCities is a dataset that is manually annotated for semantic segmentation with per-point labels, and is built using photogrammetry from images acquired by multirotors equipped with high-resolution cameras.

4 papers0 benchmarks
PreviousPage 234 of 1000Next