TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

EgoShots

Egoshots is a 2-month Ego-vision Dataset with Autographer Wearable Camera annotated "for free" with transfer learning. Three state of the art pre-trained image captioning models are used. The dataset represents the life of 2 interns while working at Philips Research (Netherlands) (May-July 2015) generously donating their data.

2 papers0 benchmarksImages

Elsevier OA CC-BY

An open corpus of Scientific Research papers which has a representative sample from across scientific disciplines. This corpus not only includes the full text of the article, but also the metadata of the documents, along with the bibliographic information for each reference.

2 papers0 benchmarksTexts

European Flood 2013 Dataset

This dataset consists of 3,710 flood images, annotated by domain experts regarding their relevance with respect to three tasks (determining the flooded area, inundation depth, water pollution).

2 papers0 benchmarksImages

FAKBAT

The Freebase Annotations of TREC KBA 2014 Stream Corpus with Timestamps (FAKBAT) is an extension of the FAKBA1 dataset that contains entity age and entity timestamp. It comprises roughly 1.2 billion timestamped documents from global public news wires, blogs, forums, and shortened links shared on social media. It spans 572 days (October 7, 2011–May 1, 2013).

2 papers0 benchmarksTexts

FPDS (Fallen People Data Set)

A benchmark for detecting fallen people lying on the floor. It consists of 6982 images, with a total of 5023 falls and 2275 non falls corresponding to people in conventional situations (standing up, sitting, lying on the sofa or bed, walking, etc). Almost all the images have been captured in indoor environments with very different situations: variation of poses and sizes, occlusions, lighting changes, etc.

2 papers0 benchmarks

FDDB-360

A 360-degree fisheye-like version of the popular FDDB face detection dataset.

2 papers0 benchmarks

Fine-Grained R2R

This dataset enriches the benchmark Room-to-Room (R2R) dataset by dividing the instructions into sub-instructions and pairing each of those with their corresponding viewpoints in the path. The overall instruction and trajectory of each sample remains the same.

2 papers0 benchmarksTexts

FinnSentiment

FinnSentiment introduces a 27,000 sentence dataset (in Finnish) annotated independently with sentiment polarity by three native annotators.

2 papers0 benchmarksTexts

Frames Dataset

This dataset is dialog dataset collected in a Wizard-of-Oz fashion. Two humans talked to each other via a chat interface. One was playing the role of the user and the other one was playing the role of the conversational agent. The latter is called a wizard as a reference to the Wizard of Oz, the man behind the curtain. The wizards had access to a database of 250+ packages, each composed of a hotel and round-trip flights. The users were asked to find the best deal. This resulted in complex dialogues where a user would often consider different options, compare packages, and progressively build the description of her ideal trip.

2 papers0 benchmarksTexts

FRSign

A large-scale and accurate dataset for vision-based railway traffic light detection and recognition.The recordings were made on selected running trains in France and benefited from carefully hand-labeled annotations.

2 papers0 benchmarksImages

Bulgarian Reading Comprehension Dataset

A dataset containing 2,221 questions from matriculation exams for twelfth grade in various subjects -history, biology, geography and philosophy-, and 412 additional questions from online quizzes in history.

2 papers0 benchmarks

Horne 2017 Fake News Data

The Horne 2017 Fake News Data contains two independed news datasets:

2 papers0 benchmarksTexts

HRA (Human Rights Archive Database)

A verified-by-experts repository of 3050 human rights violations photographs, labelled with human rights semantic categories, comprising a list of the types of human rights abuses encountered at present.

2 papers0 benchmarksImages

Human-Parts

The Human-Parts dataset is a dataset for human body, face and hand detection with ~15k images. It contains ~106k different annotations, with multiple annotations per image.

2 papers0 benchmarksImages

Icons-50

Icons-50 is a dataset for studying surface variation robustness.

2 papers0 benchmarksImages

IMEMNET (Image-MusicEmotion-Matching-Net)

The Image-MusicEmotion-Matching-Net (IMEMNet) dataset is a dataset for continuous emotion-based image and music matching. It has over 140K image-music pairs.

2 papers0 benchmarksImages, Music

InstaFake

Includes two datasets published for the detection of fake and automated accounts.

2 papers0 benchmarks

JAMUL (JApanese MUlti-Length Headline Corpus)

A large-scale evaluation dataset for headlines of three different lengths composed by professional editors.

2 papers0 benchmarksTexts

Japanese Word Similarity

This dataset contains information about Japanese word similarity including rare words. The dataset is constructed following the Stanford Rare Word Similarity Dataset. 10 annotators annotated word pairs with 11 levels of similarity.

2 papers0 benchmarksTexts

JIT Dataset (Jejueo Interview Transcripts)

The Jejueo Interview Transcripts (JIT) dataset is a parallel corpus containing 170k+ Jejueo-Korean sentences.

2 papers0 benchmarksTexts
PreviousPage 301 of 1000Next