TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

148 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

148 dataset results

ADORE (A benchmark dataset for machine learning in ecotoxicology)

ADORE is a benchmark dataset for machine learning for ecotixicology, covering acute aquatic toxicity in three relevant taxonomic groups (fish, crustaceans, and algae). The core dataset describes ecotoxicological experiments and is expanded with phylogenetic and species-specific data on the species as well as chemical properties and molecular representations. Apart from challenging other researchers to try and achieve the best model performances across the whole dataset, we propose specific relevant challenges on subsets of the data and include datasets and splittings corresponding to each of these challenge as well as in-depth characterization and discussion of train-test splitting approaches.

2 papers2 benchmarksBiology, Environment

3ThreeDWorld

TDW is a 3D virtual world simulation platform, utilizing state-of-the-art video game engine technology. A TDW simulation consists of two components: a) the Build, a compiled executable running on the Unity3D Engine, which is responsible for image rendering, audio synthesis and physics simulations; and b) the Controller, an external Python interface to communicate with the build.

1 papers0 benchmarks3D, Environment

CE4

Given the difficulty to handle planetary data we provide downloadable files in PNG format from the missions Chang'E-3 and Chang'E-4. In addition to a set of scripts to do the conversion given a different PDS4 Dataset.

1 papers0 benchmarksEnvironment, Images, Stereo

Bee4Exp Honeybee Detection

A dataset for flying honeybee detection introduced in "A Method for Detection of Small Moving Objects in UAV Videos".

1 papers6 benchmarksEnvironment, Videos

Interactive Gibson Environment

Interactive Gibson is a comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task. The benchmark has two main components:

1 papers0 benchmarksEnvironment

ColosseumRL

ColosseumRL is a framework for research in reinforcement learning in n-player games.

1 papers0 benchmarksEnvironment

gComm

gComm is a step towards developing a robust platform to foster research in grounded language acquisition in a more challenging and realistic setting. It comprises a 2-D grid environment with a set of agents (a stationary speaker and a mobile listener connected via a communication channel) exposed to a continuous array of tasks in a partially observable setting. The key to solving these tasks lies in agents developing linguistic abilities and utilizing them for efficiently exploring the environment. The speaker and listener have access to information provided in different modalities, i.e. the speaker's input is a natural language instruction that contains the target and task specifications and the listener's input is its grid-view. Each must rely on the other to complete the assigned task, however, the only way they can achieve the same, is to develop and use some form of communication. gComm provides several tools for studying different forms of communication and assessing their genera

1 papers0 benchmarksEnvironment

EviLOG (Evidential Lidar Occupancy Grid Mapping)

The dataset contains synthetic training, validation and test data for occupancy grid mapping from lidar point clouds. Additionally, real-world lidar point clouds from a test vehicle with the same lidar setup as the simulated lidar sensor is provided. Point clouds are stored as PCD files and occupancy grid maps are stored as PNG images whereas one image channel describes evidence for a free and another one describes evidence for occupied cell state.

1 papers0 benchmarksEnvironment, LiDAR, Point cloud

FastZIP Data (FastZIP Dataset and Code)

Structure of code/data folders and how to use them fastzip-code

1 papers0 benchmarksEnvironment

Bus Trajectory Dataset

This dataset contains the bus trajectory dataset collected by 6 volunteers who were asked to travel across the sub-urban city of Durgapur, India, on intra-city buses (route name: 54 Feet). During the travel, the volunteers captured sensor logs through an Android application installed on COTS smartphones.

1 papers0 benchmarksActions, Environment, Stereo

Dataset of Context information for Zero Interaction Security

We release both the processed data and evaluation results from our own experiments, and the underlying raw data that can be used for future experiments and schemes in the domain of Zero-Interaction Security. Find more details in the dataset description on Zenodo.

1 papers0 benchmarksAudio, Environment

BrazilDam Dataset

BrazilDAM is a multi sensor and multitemporal dataset that consists of multispectral images of ore tailings dams throughout Brazil. Landsat 8 and Sentinel 2 satellites that capture multispectral images over the years 2016, 2017, 2018 and 2019 were used. The dataset contains samples collected in different regions, which increases the diversity and representativeness of the characteristics of the dams.

1 papers0 benchmarksEnvironment, Images

CARLE (Cellular Automata Reinforcement Learning Environment)

CARLE is a life-like cellular automata simulator and reinforcement learning environment. CARLE is flexible, capable of simulating any of the 262,144 different rules defining Life-like cellular automaton universes. CARLE is also fast and can simulate automata universes at a rate of tens of thousands of steps per second through a combination of vectorization and GPU acceleration. Finally, CARLE is simple. Compared to high-fidelity physics simulators and video games designed for human players, CARLE's two-dimensional grid world offers a discrete, deterministic, and atomic universal playground, despite its complexity.

1 papers0 benchmarksEnvironment

TERRA-REF (TERRA-REF, An open reference data set from high resolution genomics, phenomics, and imaging sensors)

The ARPA-E funded TERRA-REF project is generating open-access reference datasets for the study of plant sensing, genomics, and phenomics. Sensor data were generated by a field scanner sensing platform that captures color, thermal, hyperspectral, and active flourescence imagery as well as three dimensional structure and associated environmental measurements. This dataset is provided alongside data collected using traditional field methods in order to support calibration and validation of algorithms used to extract plot level phenotypes from these datasets.

1 papers0 benchmarks3D, Biology, Environment, Hyperspectral images, Point cloud, Stereo, Tabular, Time series

A Datacube for the analysis of wildfires in Greece

This dataset is meant to be used to develop models for next-day fire hazard forecasting in Greece. It contains data from 2009 to 2020 at a 1km x 1km x 1 daily grid.

1 papers0 benchmarksEnvironment, Videos

Deep Indices (multi-spectral leaf/vegetation segmentation)

This dataset inclue multi-spectral acquisition of vegetation for the conception of new DeepIndices. The images were acquired with the Airphen (Hyphen, Avignon, France) six-band multi-spectral camera configured using the 450/570/675/710/730/850 nm bands with a 10 nm FWHM. The dataset were acquired on the site of INRAe in Montoldre (Allier, France, at 46°20'30.3"N 3°26'03.6"E) within the framework of the “RoSE challenge” founded by the French National Research Agency (ANR) and in Dijon (Burgundy, France, at 47°18'32.5"N 5°04'01.8"E) within the site of AgroSup Dijon. Images of bean and corn, containing various natural weeds (yarrows, amaranth, geranium, plantago, etc) and sowed ones (mustards, goosefoots, mayweed and ryegrass) with very distinct characteristics in terms of illumination (shadow, morning, evening, full sun, cloudy, rain, ...) were acquired in top-down view at 1.8 meter from the ground. (2020-05-01)

1 papers1 benchmarksEnvironment, Hyperspectral images, Images, RGB-D

DEAP City Dataset

Main Dataset city_pollution_data.csv

1 papers0 benchmarksEnvironment, Graphs, Tabular, Time series

SPAVE-28G (Signal Propagation Analyses in V2X Ecosystems (S.P.A.V.E) at 28 GHz on the NSF POWDER testbed)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksEnvironment, Graphs, Time series, Tracking

INSANE Cross-Domain UAV Data Set (Cross-Domain UAV Data Sets with Increased Number of Sensors for developing Advanced and Novel Estimators)

This data set contains over 600GB of multimodal data from a Mars analog mission, including accurate 6DoF outdoor ground truth, indoor-outdoor transitions with continuous cross-domain ground truth, and indoor data with Optitrack measurements as ground truth. With 26 flights and a combined distance of 2.5km, this data set provides you with various distinct challenges for testing and proofing your algorithms. The UAV carries 18 sensors, including a high-resolution navigation camera and a stereo camera with an overlapping field of view, two RTK GNSS sensors with centimeter accuracy, as well as three IMUs, placed at strategic locations: Hardware dampened at the center, off-center with a lever arm, and a 1kHz IMU rigidly attached to the UAV (in case you want to work with unfiltered data). The sensors are fully pre-calibrated, and the data set is ready to use. However, if you want to use your own calibration algorithms, then the raw calibration data is also ready for download. The cross-domai

1 papers0 benchmarksEnvironment, Images, Stereo, Tracking

Datasets for automatic acoustic identification of insects (Orthoptera and Cicadidae)

This dataset contains recordings of 32 sound producing insect species with a total 335 files and a length of 57 minutes. The dataset was compiled for training neural networks to automatically identify insect species while comparing adaptive, waveform-based frontends to conventional mel-spectrogram frontends for audio feature extraction. This work will be submitted for publication in the future and this dataset can be used to replicate the results, as well as other uses. The scripts for audio processing and the machine learning implementations will be published on Github.

1 papers0 benchmarksAudio, Biology, Environment
PreviousPage 6 of 8Next