TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

1,019 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

1,019 dataset results

Countix-AV

Countix-AV is a dataset for repetitive action counting by sight and sound created by repurposing the Countix dataset.

3 papers0 benchmarksVideos

Algonauts 2021 (How the Human Brain Makes Sense of a World in Motion)

The Algonauts dataset provides human brain responses to a set of 1,102 3-s long video clips of everyday events. The brain responses are measured with functional magnetic resonance imaging (fMRI). fMRI is a widely used brain imaging technique with high spatial resolution that measures blood flow changes associated with neural responses.

3 papers0 benchmarksVideos, fMRI

OREBA (Objectively Recognizing Eating Behavior and Associated Intake)

The OREBA dataset aims to provide a comprehensive multi-sensor recording of communal intake occasions for researchers interested in automatic detection of intake gestures. Two scenarios are included, with 100 participants for a discrete dish and 102 participants for a shared dish, totalling 9069 intake gestures. Available sensor data consists of synchronized frontal video and IMU with accelerometer and gyroscope for both hands.

3 papers0 benchmarksActions, Videos

Content4All

Content4All is a collection of six open research datasets aimed at automatic sign language translation research.

3 papers0 benchmarksVideos

TaL Corpus (The Tongue and Lips Corpus)

The Tongue and Lips (TaL) corpus is a multi-speaker corpus of ultrasound images of the tongue and video images of lips. This corpus contains synchronised imaging data of extraoral (lips) and intraoral (tongue) articulators from 82 native speakers of English.

3 papers0 benchmarksAudio, Speech, Texts, Videos

VideoMatting108

VideoMatting108 is a large-scale video matting and trimap generation dataset with 80 training and 28 validation foreground video clips with ground-truth alpha mattes.

3 papers0 benchmarksVideos

Vehicle-Rear

Vehicle-Rear is a novel dataset for vehicle identification that contains more than three hours of high-resolution videos, with accurate information about the make, model, color and year of nearly 3,000 vehicles, in addition to the position and identification of their license plates.

3 papers0 benchmarksImages, Videos

Navigation Turing Test

Replay data from human players and AI agents navigating in a 3D game environment.

3 papers0 benchmarksImages, Replay data, Videos

DeepFake MNIST+

DeepFake MNIST+ is a deepfake facial animation dataset. The dataset is generated by a SOTA image animation generator. It includes 10,000 facial animation videos in ten different actions, which can spoof the recent liveness detectors.

3 papers0 benchmarksVideos

TIMo (TIMo (Time-of-Flight Indoor Monitoring))

TIMo (Time-of-Flight Indoor Monitoring) is a dataset of infrared and depth videos intended for the use in Anomaly Detection and Person Detection/People Counting. It features more than 1,500 sequences for anomaly detection, which sum up to more than 500,000 individual frames. For person detection the dataset contains more than than 240 sequences. The data was captured using a Microsoft Azure Kinect RGB-D camera. In addition, we provide annotations of anomalous frame ranges for use with anomaly detection and bounding boxes and segmentation masks for use with person detection. The data was captured in parts from a tilted view and a top-down perspective.

3 papers2 benchmarksVideos

IfAct (Identifying Human Actions Visible in Online Vlogs)

We consider the task of identifying human actions visible in online videos. We focus on the widely spread genre of lifestyle vlogs, which consist of videos of people performing actions while verbally describing them. Our goal is to identify if actions mentioned in the speech description of a video are visually present.

3 papers0 benchmarksTexts, Videos

V-HICO

V-HICO is a dataset for human-object interaction in videos. There are 6,594 videos, including 5,297 training videos, 635 validation videos, 608 test videos, and 54 unseen test videos, of human-object interaction. To test the performance of models on common human-object interaction classes and generalization to new human-object interaction classes, we provide two test splits, the first one has the same human-object interaction classes in the training split while the second one consists of unseen novel classes.

3 papers0 benchmarksVideos

RLV (Reinforcement Learning with Videos)

We provide video observations of humans performing two simple tasks in natural environments. The tasks are pushing and drawer opening.

3 papers0 benchmarksVideos

MSU Video Alignment and Retrieval Benchmark Suite

Frame-to-frame video alignment/synchronization

3 papers8 benchmarksVideos

QST (Quick Sky Time)

QST contains 1,167 video clips that are cut out from 216 time-lapse 4K videos collected from YouTube, which can be used for a variety of tasks, such as (high-resolution) video generation, (high-resolution) video prediction, (high-resolution) image generation, texture generation, image inpainting, image/video super-resolution, image/video colorization, image/video animating, etc. Each short clip contains multiple frames (from a minimum of 58 frames to a maximum of 1,200 frames, a total of 285,446 frames), and the resolution of each frame is more than 1,024 x 1,024. Specifically, QST consists of a training set (containing 1000 clips, totally 244,930 frames), a validation set (containing 100 clips, totally 23,200 frames), and a testing set (containing 67 clips, totally 17,316 frames). Click here (Key: qst1) to download the QST dataset.

3 papers0 benchmarksImages, Videos

PP-HumanSeg14K

A large-scale video portrait dataset that contains 291 videos from 23 conference scenes with 14K frames. This dataset contains various teleconferencing scenes, various actions of the participants, interference of passers-by and illumination change.

3 papers0 benchmarksImages, Videos

KITTI-Masks

This Dataset consists of 2120 sequences of binary masks of pedestrians. The sequence length varies between 2-710. For details, we refer to our paper. It is based on the original KITTI Segmentation challenge which can be found at https://www.vision.rwth-aachen.de/page/mots

3 papers1 benchmarksImages, Videos

PETRAW (PEg TRAnsfer Workflow recognition by different modalities)

PETRAW data set was composed of 150 sequences of peg transfer training sessions. The objective of the peg transfer session is to transfer 6 blocks from the left to the right and back. Each block must be extracted from a peg with one hand, transferred to the other hand, and inserted in a peg at the other side of the board. All cases were acquired by a non-medical expert on the LTSI Laboratory from the University of Rennes. The data set was divided into a training data set composed of 90 cases and a test data set composed of 60 cases. A case was composed of kinematic data, a video, semantic segmentation of each frame, and workflow annotation.

3 papers7 benchmarksBiomedical, Images, Videos

MeLa BitChute

MeLa BitChute is a near-complete dataset of over 3M videos from 61K channels over 2.5 years (June 2019 to December 2021) from the social video hosting platform BitChute, a commonly used alternative to YouTube. Additionally, the dataset includes a variety of video-level metadata, including comments, channel descriptions, and views for each video.

3 papers0 benchmarksVideos

DFDM (Deepfake videos generated from different models)

We created a new dataset, named DFDM, with 6,450 Deepfake videos generated by different Autoencoder models. Specifically, five Autoencoder models with variations in encoder, decoder, intermediate layer, and input resolution, respectively, have been selected to generate Deepfakes based on the same input. We have observed the visible but subtle visual differences among different Deepfakes, demonstrating the evidence of model attribution artifacts.

3 papers0 benchmarksVideos
PreviousPage 30 of 51Next