TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

DPPIN

DPPIN is a collection of dynamic networks, which consists of twelve generated dynamic protein-protein interaction networks of yeast cells, stored in twelve folders.

2 papers0 benchmarksGraphs

Unbalance Classification Using Vibration Data (Vibration Measurements on a Rotating Shaft at Different Unbalance Strengths)

This dataset contains vibration data recorded on a rotating drive train. This drive train consists of an electronically commutated DC motor and a shaft driven by it, which passes through a roller bearing. With the help of a 3D-printed holder, unbalances with different weights and different radii were attached to the shaft. Besides the strength of the unbalances, the rotation speed of the motor was also varied. This dataset can be used to develop and test algorithms for the automatic detection of unbalances on drive trains. Datasets for 4 differently sized unbalances and for the unbalance-free case were recorded. The vibration data was recorded at a sampling rate of 4096 values per second. Datasets for development (ID "D[0-4]") as well as for evaluation (ID "E[0-4]") are available for each unbalance strength. The rotation speed was varied between approx. 630 and 2330 RPM in the development datasets and between approx. 1060 and 1900 RPM in the evaluation datasets. For each measurement of

2 papers0 benchmarksTime series

Common Crawl

The Common Crawl corpus contains petabytes of data collected over 12 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts. Common Crawl data is stored on Amazon Web Services’ Public Data Sets and on multiple academic cloud platforms across the world.

2 papers0 benchmarksTexts

CADSketchNet

CADSketchNet is an annotated collection of sketches of 3D CAD models.

2 papers0 benchmarks

AIP Environment

AI Playground (AIP) is an open-source, Unreal Engine-based tool for generating and labeling virtual image data. With AIP, it is trivial to capture the same image under different conditions (e.g., fidelity, lighting, etc.) and with different ground truths (e.g., depth or surface normal values). AIP is easily extendable and can be used with or without code.

2 papers0 benchmarks

SECBENCH

Dataset of 676 security vulnerabilities patches. In 2017, we mined the commits messages of 238 projects using regular expressions for each vulnerability (cf. Patterns). In 2020, we classified vulnerabilities using the CWE taxonomy. Some vulnerabilities contain the score and severity information (CVEs).

2 papers0 benchmarks

SportSett

This resource is designed to allow for research into Natural Language Generation. In particular, with neural data-to-text approaches although it is not limited to these.

2 papers0 benchmarksTabular, Texts

Cylinder in Crossflow

Cylinder in Crossflow is a synthetic dataset that involves unsteady laminar flow past a cylinder that generates vortex shedding pattern known as a von Kármán vortex street. The governing equations for this system are the incompressible Navier-Stokes equations. The cylinder has a diameter of 1 and the free stream velocity is 1. The kinematic viscosity $\nu$ is varied such that the Reynolds number is between 100 and 400. Symmetry boundary conditions are applied at the top and bottom edges of the domain and an open pressure boundary condition is applied at the outlet. Solutions are generated on the unstructured mesh of 6384 quad elements.

2 papers0 benchmarks

Hockey Fight Detection Dataset

Whereas the action recognition community has focused mostly on detecting simple actions like clapping, walking or jogging, the detection of fights or in general aggressive behaviors has been comparatively less studied. Such capability may be extremely useful in some video surveillance scenarios like in prisons, psychiatric or elderly centers or even in camera phones. After an analysis of previous approaches we test the well-known Bag-of-Words framework used for action recognition in the specific problem of fight detection, along with two of the best action descriptors currently available: STIP and MoSIFT. For the purpose of evaluation and to foster research on violence detection in video we introduce a new video database containing 1000 sequences divided in two groups: fights and non-fights. Experiments on this database and another one with fights from action movies show that fights can be detected with near 90% accuracy.

2 papers4 benchmarksVideos

Emomusic (Emotion in Music Database)

1000 songs has been selected from Free Music Archive (FMA). The excerpts which were annotated are available in the same package song ids 1 to 1000. Some redundancies were identified, which reduced the dataset down to 744 songs. The dataset is split between the development set (619 songs) and the evaluation set (125 songs). The extracted 45 seconds excerpts are all re-encoded to have the same sampling frequency, i.e, 44100Hz.

2 papers2 benchmarksAudio, Music

Hindi MSR-VTT (Hindi Microsoft reseacrh video to text)

This dataset is the Hindi version of standard English MSR-VTT dataset.

2 papers1 benchmarks

Wikidata-14M

Wikidata-14M is a recommender system dataset for recommending items to Wikidata editors. It consists of 220,000 editors responsible for 14 million interactions with 4 million items.

2 papers0 benchmarksTexts

Global Wheat Head 2021 (Global Wheat Head Dataset 2021)

Global WHEAT Dataset 2021 is the extentions of the Global Wheat Dataset 2020. It is the first large-scale dataset for wheat head detection from field optical images. It included a very large range of cultivars from differents continents. Wheat is a staple crop grown all over the world and consequently interest in wheat phenotyping spans the globe. Therefore, it is important that models developed for wheat phenotyping, such as wheat head detection networks, generalize between different growing environments around the world.

2 papers0 benchmarksImages

Multinational Structured Address Dataset

The Multinational Structured Address Dataset is a collection of addresses of 61 different countries. The addresses can either be "complete" (all the usual address components) or "incomplete" (missing some usual address components).

2 papers0 benchmarks

MyFood Dataset

MyFood Dataset is an image database for segmenting images of Brazilian foods. Composed of 9 classes: rice, beans, boiled egg, fried egg, pasta, salad, roasted meat, apple and chicken breast. With an average of 125 images per class and a total of 1250 images, with a ratio of 60-20-20 for the training, validation and testing sets, respectively.

2 papers0 benchmarks

RaidaR (RaidaR: A Rich Annotated Image Dataset of Rainy Street Scenes)

RaidaR is a rich annotated image dataset of rainy street scenes. RaidaR consists of 58,542 real rainy images containing several rain-induced artifacts: fog, droplets, road reflections, etc. 5,000/3,658 images were carefully semantic/instance segmentated, respectively.

2 papers0 benchmarksImages

COVIDEmo

A dataset of tweets that reference the COVID-19 pandemic with emotion labels.

2 papers0 benchmarks

CalCROP21

CalCROP21 is a georeferenced multi-spectral dataset of satellite Imagery and crop labels. It is a semantic segmentation benchmark dataset, for the diverse crops in the Central Valley region of California at 10m spatial resolution using a Google Earth Engine based robust image processing pipeline.

2 papers0 benchmarksImages

DUC 2007 (Document Understanding Conferences)

There is currently much interest and activity aimed at building powerful multi-purpose information systems. The agencies involved include DARPA, ARDA and NIST. Their programmes, for example DARPA's TIDES (Translingual Information Detection Extraction and Summarization) programme, ARDA's Advanced Question & Answering Program and NIST's TREC (Text Retrieval Conferences) programme cover a range of subprogrammes. These focus on different tasks requiring their own evaluation designs.

2 papers0 benchmarksTexts

DONeRF: Evaluation Dataset

This is the dataset for the CGF 2021 paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks".

2 papers1 benchmarksImages, RGB-D
PreviousPage 315 of 1000Next