TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

271 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

271 dataset results

Metadata for all 622 UCI datasets

This dataset contains the extraction made in 2022 of all the 622 datasets that existed then at the UCI Machine Learning Repository. It contains the index, its name, its url, the instances (number os lines), the number of attributes (columns), the year it was created, the area, such as Life, Social, etc., the web_hits at the time, the data folder url, where the data were in the internet, the dataset_file_url, the URL for the data, the dataset_file_format (format, such as data, txt, Z, etc), the names_file_url, which describe the files with the description of the attributes, the names_file_format which describe the format of the previous file, the attribute_info, which describe the information of all the attributes or columns that are in the dataset, the source, the data_set_information, the relevant_papers associated with this dataset, the papers_that_cite_this_data_set, and a final column with the number of papers that cite this dataset.

2 papers0 benchmarksTabular

Reddit Ideological and Extreme Bias Dataset

Articles originating from subreddits with explicitly stated ideologies are categorized into three groups: 72,488 articles in the Liberal class, 79,573 articles in the Conservative class, and 225,083 articles in the Restricted class.

2 papers2 benchmarksTables, Tabular, Texts

BASEPROD (The Bardenas Semi-Desert Planetary Rover Dataset)

BASEPROD provides comprehensive rover sensor data collected over a 1.7 km traverse, accompanied by high-resolution 2D and 3D drone maps of the terrain. The dataset also includes laser-induced breakdown spectroscopy (LIBS) measurements from key sampling sites along the rover's path, as well as weather station data to contextualize environmental conditions.

2 papers0 benchmarks3D, Environment, Images, Point cloud, RGB-D, Stereo, Tabular, Time series

Insider Threat Test Dataset

The Insider Threat Test Dataset is a collection of synthetic insider threat test datasets that provide both background and malicious actor synthetic data.

2 papers1 benchmarksTabular

Image-based Confounding Dataset

Replication Data for: Integrating Earth Observation Data into Causal Inference: Challenges and Opportunities

2 papers0 benchmarksImages, Tabular

Amazon MTPP (Marked Temporal Point Processes on Amazon data)

The dataset includes time-stamped user product reviews behavior from January, 2008 to October, 2018. Each user has a sequence of produce review events with each event containing the timestamp and category of the reviewed product, with each category corresponding to an event type.

2 papers4 benchmarksTabular, Time series

StackOverflow MTPP (Marked Temporal Point Processes on StackOverflow data)

The dataset has two years of user awards on a question-answering website: each user received a sequence of badges and there are 22 different kinds of badges in total.

2 papers4 benchmarksTabular, Time series

AgeGroup Transactions MTPP (Marked Temporal Point Processes on financial transactions data)

The dataset contains historical financial transactions, including time, category and cost fields. There are 50000 clients, 205 categories and 43.7M events. The original goal was to predict the age group of the client. In this variant of the dataset, the goal is to forecast multiple future events.

2 papers4 benchmarksTabular, Time series

Database of axial impact simulations of the crash box (Database for crashworthiness optimisation)

This repository contains the database of the FEM simulation of axially impacted various configurations of the square crash boxes. This database records the impact of the structural and crash test parameters on the various crashworthiness objectives.

2 papers0 benchmarksTables, Tabular

WebEdit

Fact-based Text Editing dataset based on WebNLG dataset.

1 papers9 benchmarksTabular, Texts

RotoEdit

Fact-based Text Editing dataset based on RotoWire dataset

1 papers9 benchmarksTabular, Texts

LinkedResults

The LinkedResults dataset contains around 1,600 results capturing performance of machine learning models from tables of 239 papers. All tables come from a subset of SegmentedTables dataset. Each result is a tuple of form (task, dataset, metric name, metric value) and is linked to a particular table, row and cell it originates from.

1 papers0 benchmarksTabular

Co/FeMn bilayers

Co/FeMn bilayers measured.

1 papers0 benchmarksTabular

Undecided Voters in US Presidential Elections

This data contains the election polls for the 2004, 2008, 2012, and 2016 US presidential election by state including data on undecided voter proportions.

1 papers0 benchmarksTabular

TERRA-REF (TERRA-REF, An open reference data set from high resolution genomics, phenomics, and imaging sensors)

The ARPA-E funded TERRA-REF project is generating open-access reference datasets for the study of plant sensing, genomics, and phenomics. Sensor data were generated by a field scanner sensing platform that captures color, thermal, hyperspectral, and active flourescence imagery as well as three dimensional structure and associated environmental measurements. This dataset is provided alongside data collected using traditional field methods in order to support calibration and validation of algorithms used to extract plot level phenotypes from these datasets.

1 papers0 benchmarks3D, Biology, Environment, Hyperspectral images, Point cloud, Stereo, Tabular, Time series

DocBank-TB (DocBank-Table)

This dataset consisting 500 set of caption, table and coresponding paper page, processed from DocBank.

1 papers0 benchmarksTabular, Texts

PEM Fuel Cell Dataset (Proton Exchange Membrane (PEM) Fuel Cell Dataset)

This dataset are about Nafion 112 membrane standard tests and MEA activation tests of PEM fuel cell in various operation condition. Dataset include two general electrochemical analysis method, Polarization and Impedance curves. In this dataset, effect of different pressure of H2/O2 gas, different voltages and various humidity conditions in several steps are considered. Behavior of PEM fuel cell during distinct operation condition tests, activation procedure and different operation condition before and after activation analysis can be concluded from data. In Polarization curves, voltage and power density change as a function of flows of H2/O2 and relative humidity. Resistance of the used equivalent circuit of fuel cell can be calculated from Impedance data. Thus, experimental response of the cell is obvious in the presented data, which is useful in depth analysis, simulation and material performance investigation in PEM fuel cell researches.

1 papers0 benchmarksTables, Tabular

DBFC Dataset (Single Direct Borohydride Fuel Cell Dataset)

This dataset includes Direct Borohydride Fuel Cell (DBFC) impedance and polarization test in anode with Pd/C, Pt/C and Pd decorated Ni–Co/rGO catalysts. In fact, different concentration of Sodium Borohydride (SBH), applied voltages and various anode catalysts loading with explanation of experimental details of electrochemical analysis are considered in data. Voltage, power density and resistance of DBFC change as a function of weight percent of SBH (%), applied voltage and amount of anode catalyst loading that are evaluated by polarization and impedance curves with using appropriate equivalent circuit of fuel cell. Can be stated that interpretation of electrochemical behavior changes by the data of related cell is inevitable, which can be useful in simulation, power source investigation and depth analysis in DB fuel cell researches.

1 papers0 benchmarksTables, Tabular

5DOF GB Interpolation (Five Degree-of-Freedom Grain Boundary Interpolation)

These are larger MATLAB .mat files required for reproducing plots from the sgbaird-5DOF/interp repository for grain boundary property interpolation. gitID-0055bee_uuID-475a2dfd_paper-data6.mat contains multiple trials of five degree-of-freedom interpolation model runs for various interpolation schemes. gpr46883_gitID-b473165_puuID-50ffdcf6_kim-rng11.mat contains a Gaussian Process Regression model trained on 46883 Fe simulation GBs. See Five degree-of-freedom property interpolation of arbitrary grain boundaries via Voronoi fundamental zone framework DOI: 10.1016/j.commatsci.2021.110756 for the peer-reviewed, published version of the paper.

1 papers0 benchmarksTabular

EUEN17037_Daylight_and_View_Standard_TestDataSet

EUEN17037 Daylight and View Standard Test Dataset.

1 papers0 benchmarks3D, Point cloud, Tabular
PreviousPage 6 of 14Next