TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

1,019 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

1,019 dataset results

UCLA Aerial Event Dataset

The UCLA Aerial Event Dataest has been captured by a low-cost hex-rotor with a GoPro camera, which is able to eliminate the high frequency vibration of the camera and hold in air autonomously through a GPS and a barometer. It can also fly 20 ∼ 90m above the ground and stays 5 minutes in air.

5 papers0 benchmarksVideos

GolfDB

GolfDB is a high-quality video dataset created for general recognition applications in the sport of golf, and specifically for the task of golf swing sequencing.

5 papers0 benchmarksVideos

MLB-YouTube Dataset

The MLB-YouTube dataset is a new, large-scale dataset consisting of 20 baseball games from the 2017 MLB post-season available on YouTube with over 42 hours of video footage. The dataset consists of two components: segmented videos for activity recognition and continuous videos for activity classification. It is quite challenging as it is created from TV broadcast baseball games where multiple different activities share the camera angle. Further, the motion/appearance difference between the various activities is quite small.

5 papers0 benchmarksVideos

MPII Cooking 2 Dataset

A dataset which provides detailed annotations for activity recognition.

5 papers4 benchmarksVideos

SYNTHIA-AL

Specially designed to evaluate active learning for video object detection in road scenes.

5 papers0 benchmarksVideos

Tasty Videos

A collection of 2511 recipes for zero-shot learning, recognition and anticipation.

5 papers0 benchmarksTexts, Videos

WWW Crowd

WWW Crowd provides 10,000 videos with over 8 million frames from 8,257 diverse scenes, therefore offering a comprehensive dataset for the area of crowd understanding.

5 papers0 benchmarksImages, Videos

HEV-I (Honda Egocentric View-Intersection Dataset)

Honda Egocentric View-Intersection Dataset (HEV-I) is introduced to enable research on traffic participants interaction modelling, future object localization, as well as learning driver action in challenging driving scenarios. The dataset includes 230 video clips of real human driving in different intersections from the San Francisco Bay Area, collected using an instrumented vehicle equipped with different sensors including cameras, GPS/IMU, and vehicle states signals.

5 papers5 benchmarksVideos

iPer (Impersonator)

iPer is a new dataset, with diverse styles of clothes in videos, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis. There are 30 subjects of different conditions of shape, height, and gender. Each subject wears different clothes and performs an A-pose video and a video with random actions. There are 103 clothes in total. The whole dataset contains 206 video sequences with 241,564 frames.

5 papers0 benchmarksVideos

LoLi-Phone

LoLi-Phone is a large-scale low-light image and video dataset for Low-light image enhancement (LLIE). The images and videos are taken by different mobile phones' cameras under diverse illumination conditions.

5 papers0 benchmarksImages, Videos

JRDB-Act

JRDB-Act is an extension of the JRDB dataset to create a large-scale multi-modal dataset for spatio-temporal action, social group and activity detection.

5 papers0 benchmarksVideos

LIVE Livestream

LIVE Livestream is a database for Video Quality Assessment (VQA), specifically designed for live streaming VQA research. The dataset is called the Laboratory for Image and Video Engineering (LIVE) Live stream Database. The LIVE Livestream Database includes 315 videos of 45 contents impaired by 6 types of distortions.

5 papers3 benchmarksVideos

Sims4Action

The Sims4Action Dataset: a videogame-based dataset for Synthetic→Real domain adaptation for human activity recognition.

5 papers0 benchmarksActions, Videos

VidOR

VidOR (Video Object Relation) dataset contains 10,000 videos (98.6 hours) from YFCC100M collection together with a large amount of fine-grained annotations for relation understanding. In particular, 80 categories of objects are annotated with bounding-box trajectory to indicate their spatio-temporal location in the videos; and 50 categories of relation predicates are annotated among all pairs of annotated objects with starting and ending frame index. This results in around 50,000 object and 380,000 relation instances annotated. To use the dataset for model development, the dataset is split into 7,000 videos for training, 835 videos for validation, and 2,165 videos for testing.

5 papers12 benchmarksVideos

MSU Shot Boundary Detection Benchmark

This is a dataset for a shot boundary detection task. The dataset contains 2 existing datasets and 19 manually marked up open source videos with a total length of more than 1200 minutes and 10000 scene transitions. The dataset includes different types of videos with different resolutions from 360×288 to 1920×1080 in MP4 and MKV formats. Videos include samples in RGB scale or in grayscale with FPS from 23 to 60.

5 papers2 benchmarksVideos

OAK (Objects Around Krishna)

OAK is a dataset for online continual object detection benchmark with an egocentric video dataset. OAK adopts the KrishnaCam videos, an ego-centric video stream collected over nine months by a graduate student. OAK provides exhaustive bounding box annotations of 80 video snippets (~17.5 hours) for 105 object categories in outdoor scenes.

5 papers0 benchmarksVideos

UESTC RGB-D (UESTC RGB-D Varying-view action database)

UESTC RGB-D Varying-view action database contains 40 categories of aerobic exercise. We utilized 2 Kinect V2 cameras in 8 fixed directions and 1 round direction to capture these actions with the data modalities of RGB video, 3D skeleton sequences and depth map sequences.

5 papers8 benchmarksVideos

D3D-HOI

D3D-HOI is a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions. The dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints. Each manipulated object (e.g., microwave oven) is represented with a matching 3D parametric model. This data allows researchers to evaluate the reconstruction quality of articulated objects and establish a benchmark for this challenging task.

5 papers0 benchmarksVideos

Acappella

Acappella comprises around 46 hours of a cappella solo singing videos sourced from YouTbe, sampled across different singers and languages. Four languages are considered: English, Spanish, Hindi and others.

5 papers0 benchmarksAudio, Videos

ForgeryNet

We construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across four tasks: 1) Image Forgery Classification, including two-way (real / fake), three-way (real / fake with identity-replaced forgery approaches / fake with identity-remained forgery approaches), and n-way (real and 15 respective forgery approaches) classification. 2) Spatial Forgery Localization, which segments the manipulated area of fake images compared to their corresponding source real images. 3) Video Forgery Classification, which re-defines the video-level forgery classification with manipulated frames in random positions. This task is important because attackers in real world are free to manipulate any target frame. and 4) Temporal Forgery Localization, to localize the temporal segments which are manipulated. ForgeryNet is by far the largest publicly available deep face forgery dataset in terms of data-scale (2.9 million images, 221,247 video

5 papers2 benchmarksImages, Videos
PreviousPage 25 of 51Next