TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

271 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

271 dataset results

MPOSE2021 (MPOSE2021 Dataset for Short-time Human Action Recognition)

MPOSE2021, a dataset for real-time short-time HAR, suitable for both pose-based and RGB-based methodologies. It includes 15,429 sequences from 100 actors and different scenarios, with limited frames per scene (between 20 and 30). In contrast to other publicly available datasets, the peculiarity of having a constrained number of time steps stimulates the development of real-time methodologies that perform HAR with low latency and high throughput.

1 papers0 benchmarksImages, Tabular

AU Dataset for Visuo-Haptic Object Recognition for Robots

Multimodal object recognition is still an emerging field. Thus, publicly available datasets are still rare and of small size. This dataset was developed to help fill this void and presents multimodal data for 63 objects with some visual and haptic ambiguity. The dataset contains visual, kinesthetic and tactile (audio/vibrations) data. To completely solve sensory ambiguity, sensory integration/fusion would be required. This report describes the creation and structure of the dataset. The first section explains the underlying approach used to capture the visual and haptic properties of the objects. The second section describes the technical aspects (experimental setup) needed for the collection of the data. The third section introduces the objects, while the final section describes the structure and content of the dataset.

1 papers0 benchmarksImages, Tabular, Time series

Drosophila Immunity Time-Course Data

The data used for all results in this paper can be found here. This directory contains:

1 papers0 benchmarksBiology, Tabular, Time series

Survey answers (Answers to surveys in both papers, as well as processed answers)

Please see paper for questions. These are the answers to the surveys, processed and included in the paper via knitr

1 papers0 benchmarksTabular

Rice Dataset Commeo and Osmancik

ata Set Name: Rice Dataset (Commeo and Osmancik) Abstract: A total of 3810 rice grain's images were taken for the two species (Cammeo and Osmancik), processed and feature inferences were made. 7 morphological features were obtained for each grain of rice.

1 papers0 benchmarksTables, Tabular

CVR (Congressional Voting Records Data Set)

This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition).

1 papers1 benchmarksTabular

FICS PCB Image Collection (FPIC)

Optical images of printed circuit boards as well as detailed annotations of any text, logos, and surface-mount devices (SMDs). There are several hundred samples spanning a wide variety of manufacturing locations, sizes, node technology, applications, and more.

1 papers0 benchmarksImages, Tabular, Texts

Volunteer task execution events in Galaxy Zoo and The Milky Way citizen science projects

Context of the data sets The Zooniverse platform (www.zooniverse.org) has successfully built a large community of volunteers contributing to citizen science projects. Galaxy Zoo and the Milky Way Project were hosted there.

1 papers0 benchmarksActions, Tabular, Time series

Error Grids for multi-fidelity benchmark functions in mf2

Provide:

1 papers0 benchmarksTabular

EUCA dataset

EUCA dataset description Associated Paper: EUCA: the End-User-Centered Explainable AI Framework

1 papers0 benchmarksTabular

NVALT-8

Te NVALT-8 study (m=200 participants) examined if nadroparin combined with chemotherapy could reduce cancer relapse after surgical removal of a non-small cell lung tumour.

1 papers0 benchmarksMedical, Tabular

NVALT-11

The NVALT-11 study considered the effect of profylactic brain radiation versus observation in ($m$=174) patients with advanced non-small cell lung cancer.

1 papers0 benchmarksMedical, Tabular

BODMAS (Blue Hexagon Open Dataset for Malware AnalysiS)

We collaborate with Blue Hexagon to release a dataset containing timestamped malware samples and well-curated family information for research purposes. The BODMAS dataset contains 57,293 malware samples and 77,142 benign samples collected from August 2019 to September 2020, with carefully curated family information (581 families). We also provide preprocessed feature vectors and metadata available to everyone. The malware binaries can be obtained per request.

1 papers0 benchmarksTabular

CANDOR Corpus (CANDOR = Conversation: A Naturalistic Dataset of Online Recordings)

The CANDOR corpus is a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections.

1 papers0 benchmarksImages, Tabular, Texts, Time series, Videos

Replication Data for: "Deciphering Bitcoin Blockchain Data by Cohort Analysis" Version 3.1

Bitcoin is a peer-to-peer electronic payment system that popularized rapidly in recent years. Usually, we need to query the complete history of bitcoin blockchain data to acquire variables of economic meaning. This becomes increasingly difficult now with over 1.6 billion historical transactions on the Bitcoin blockchain. It is thus important to query Bitcoin transaction data in a way that is more efficient and provides economic insights. We apply cohort analysis that interprets bitcoin blockchain data using methods developed for population data in social science. Specifically, we query and process the Bitcoin transaction input and output data within each daily cohort. With this, we then create datasets and visualizations for some key indicators of bitcoin transactions, including the daily lifespan distributions of accumulated spent transaction output (STXO) and the daily age distributions of accumulated unspent transaction output (UTXO). We provide a computationally feasible approach t

1 papers0 benchmarksTabular

Electromagnetic Calorimeter Shower Images

Each HDF5 file has the following structure:

1 papers0 benchmarksImages, Tabular

MIMI dataset (Multi-aspect Integrated Migration Indicators dataset)

Nowadays, new branches of research are proposing the use of non-traditional data sources for the study of migration trends in order to find an original methodology to answer open questions about cross-border human mobility. The Multi-aspect Integrated Migration Indicators (MIMI) dataset is a new dataset to be exploited in migration studies as a concrete example of this new approach. It includes both official data about bidirectional human migration (traditional flow and stock data) with multidisciplinary variables and original indicators, including economic, demographic, cultural and geographic indicators, together with the Facebook Social Connectedness Index (SCI). It results from the process of gathering, embedding and integrating traditional and novel variables, resulting in this new multidisciplinary dataset that could significantly contribute to nowcast/forecast bilateral migration trends and migration drivers.

1 papers0 benchmarksTabular

BreastRates4 ([MIMBCD-UI] UTA4: Rates Dataset)

Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we present our severity rates (BIRADS) of clinicians while diagnosing several patients from our User Tests and Analysis 4 (UTA4) study. Here, we provide a dataset for the measurements of severity rates (BIRADS) concerning the patient diagnostic. Work and results are published on a top Human-Computer Interaction (HCI) conference named AVI 2020 (page). Results were analyzed and interpreted from our Statistical Analysis charts. The user tests were made in clinical institutions, where clinicians diagnose several patients for a Single-Modality vs Multi-Modality comparison. For example, in these tests, we used both prototype-single-modality and prototype-multi-modality repositories for the comparison. On the same hand, the hereby dataset represents the pieces of information of bot

1 papers0 benchmarksBiomedical, Images, Medical, Tabular

IEIs (Ion and Electron Insulators)

We would like to introduce three types of ion and electron insulators, i.e. Li-ion & electron insulators (LEIs), Na-ion & electron insulators (NEIs), and K-ion & electron insulators (KEIs), and provide a set of codes here to screen candidate materials from computational material database, Materials Project. The IEI materials are able to block the transport of multiple charge carriers (ions and electrons) and stay thermodynamically stable against specific alkali-metals. The screening workflows and usage of IEI materials in rechargeable solid-state Li/Na/K metal batteries are presented in the paper below.

1 papers0 benchmarksTabular

EVI

The EVI dataset is a challenging, multilingual spoken-dialogue dataset with 5,506 dialogues in English, Polish, and French. The dataset can be used to develop and benchmark conversational systems for user authentication tasks, i.e. speaker enrolment (E), speaker verification (V), speaker identification (I).

1 papers0 benchmarksDialog, Speech, Tabular, Texts
PreviousPage 7 of 14Next