TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

1,019 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

1,019 dataset results

SF20K (Short-Films 20K)

Short-Films 20K (SF20K) is the largest publicly available movie dataset. SF20K is composed of 20,143 amateur films and offers long-term video tasks in the form of multiple-choice and open-ended question answering.

1 papers0 benchmarksAudio, Texts, Videos

FDMSE-ISL

A large-scale isolated Indian sign language dataset. It contains 2002 common words, used in daily communications among Indian deaf community. The dataset contains 40033 videos across 2002 words. The total duration of the dataset is around 36.2 hours with 7.8 Million frames.

1 papers1 benchmarksRGB-D, Videos

Hawk Annotation Dataset

Hawk Annotation Dataset includes language descriptions specifically for anomaly scenes in seven existing video anomaly datasets. These seven datasets include a variety of anomalous scenarios such as crime (UCF-Cirme), campus (ShanghaiTech and CUHK Avenue), pedestrian walkways (UCSD Ped1 and Ped2), traffic (DoTA), and human behavior (UBnormal). With the support of these visual scenarios, this dataset can perform comprehensive fine-tuning for various abnormal scenarios, being closer to open-world scenarios.

1 papers0 benchmarksTexts, Videos

MuSoHu (Toward human-like social robot navigation: A large-scale, multi-modal, social human navigation dataset)

A large-scale, egocentric, multimodal, and context-aware dataset of human demonstrations of social navigation.

1 papers0 benchmarks3D, Actions, LiDAR, Point cloud, RGB-D, Stereo, Videos

EE3P Dataset

EE3P: Event-based Estimation of Periodic Phenomena Properties (Dataset) Kolář, J., Špetlík, R., Matas, J. (2024) Measuring Speed of Periodical Movements with Event Camera. In Proceedings of the 27th Computer Vision Winter Workshop, 2024

1 papers0 benchmarksVideos

PPED (Periodic Phenomena Event-based Dataset)

PPED: Periodic Phenomena Event-based Dataset The dataset features 12 one-second sequences of periodic phenomena (rotation - 01-06, flicker - 07-08, vibration - 09-10 and movement - 11-12) with GT frequencies ranging from 3.2Hz up to 2000Hz in file formats .raw and .hdf5.

1 papers0 benchmarksVideos

Fingertip Video Dataset of HB Estimation (Fingertip Video Dataset for Non-Invasive Diagnosis of Anemia)

This dataset comprises 1-minute fingertip video recordings collected from 150 anemic patients, ranging from 6 months to 32 years of age, with hemoglobin levels between 4.3 gm/dL and 12.4 gm/dL. The videos were recorded using a smartphone’s camera and flashlight, designed to capture PPG (Photoplethysmography) signals, which are essential for non-invasive hemoglobin level estimation. Key Features:

1 papers0 benchmarksVideos

ConSLAM (Construction Dataset for SLAM)

ConSLAM is a real-world dataset collected periodically on a construction site to measure the accuracy of mobile scanners' SLAM algorithms.

1 papers0 benchmarks3D, LiDAR, Point cloud, RGB Video, Tracking, Videos

ENF moving video (Electric Network Frequency Moving Video Dataset)

The ENF moving video dataset, which is a subset of the dataset used in Temporal Localization of Non-Static Digital Videos Using the Electrical Network Frequency , consists of video recording without the audio channel coupled with the corresponding power ENF signal reference in WAV format at a rate of 1 kHz. The dataset is made of 8 video clips recorded in Europe at 29.97 frames per second, with a duration of approximately 11-12 minutes, using a GoPro Hero 4 Black and an NK AC3061-4KN camera. In terms of content, videos 1-3 are entirely stationary, videos 4-5 are predominantly stationary with some movement, and videos 6-8 are non-stationary, meaning the camera is fixed, but there are moving objects in most frames. All videos depict natural, everyday indoor scenes (i.e., not plain backgrounds).

1 papers0 benchmarksAudio, Videos

DailyMoth-70h

DailyMoth-70h is a fully self-contained ASL-to-English sign language dataset containing over 70h of video (48K clips) with aligned English captions of a single native ASL signer (white, male, and early middle-aged) from the ASL news channel TheDailyMoth. The primary purpose of the dataset is to be used as a benchmark and analysis dataset for (gloss-free) sign language translation.

1 papers0 benchmarksTexts, Videos

AViMoS (Audio-Visual Mouse Saliency)

A novel audio-visual mouse saliency (AViMoS) dataset with the following key-features:

1 papers0 benchmarksAudio, Time series, Tracking, Videos

SCOPE Dataset

A Chinese sign language dataset that includes dialogue information.

1 papers0 benchmarksTexts, Videos

Aria Everyday Objects

A small-scale, real-world Project Aria dataset with high quality static 3D oriented bounding boxs annotations.

1 papers6 benchmarks3D, Point cloud, Videos

Bukva (Bukva: Russian Sign Language Alphabet)

We introduce a video dataset Bukva for Russian Dactyl Recognition task. Bukva dataset size is about 27 GB, and it contains 3757 RGB videos with more than 101 samples for each RSL alphabet sign, including dynamic ones. The dataset is divided into training set and test set by subject user_id. The training set includes 3097 videos, and the test set includes 660 videos. The total video recording time is ~4 hours. About 17% of the videos are recorded in HD format, and 70% of the videos are in FullHD resolution.

1 papers1 benchmarksRGB Video, Videos

CAS-VSR-S101

A new large-scale, in-thewild Mandarin dataset, CAS-VSR-S101 with 101.1 hours of data. The videos are sourced from broadcast news and conversational programs in Chinese, covering a highly diverse set of topics, speakers and filming conditions. The lengths of the utterances are naturally distributed between 0.01s and 10.57s, and image qualities and resolutions vary. News accounts for 82.4% of the programs. 70.4% of the utterances depict news anchors, hosts and correspondents, while 29.6% are those of interviewees and guests. In addition, at a ratio of approximately 1.5 : 1, male and female appearances are relatively balanced. It is divided into train, validation and test sets by TV channels to minimize speaker overlap, and at a ratio of roughly 8 : 1 : 1.5 in terms of duration. The validation and test sets are composed of programs broadcast on provincial TV channels. The dataset is available for academic use under a license.

1 papers4 benchmarksAudio, Speech, Texts, Videos

MAHNOB-HCI (MAHNOB-HCI-Tagging database)

Characterising multimedia content with relevant, reliable and discriminating tags is vital for multimedia information retrieval. With the rapid expansion of digital multimedia content, alternative methods to the existing explicit tagging are needed to enrich the pool of tagged content. Currently, social media websites encourage users to tag their content. However, the users’ intent when tagging multimedia content does not always match the information retrieval goals. A large portion of user defined tags are either motivated by increasing the popularity and reputation of a user in an online com-munity or based on individual and egoistic judgments. Moreover, users do not evaluate media content on the same criteria. Some might tag multimedia content with words to express their emotion while others might use tags to describe the content. For example, a picture receive different tags based on the objects in the image, the camera by which the picture was taken or the emotion a user felt look

1 papers0 benchmarksAudio, EEG, Videos

Dynamic Appearance Dataset

We study dynamic appearance models of both relightable (BRDF) and non-relightable (RGB). For both we introduce new pilot datasets, allowing, for the first time, to study such phenomena: For RGB we provide 22 dynamic textures acquired from free online sources; For BRDFs, we further acquire a dataset of 21 flash-lit videos of time-varying materials, enabled by a simple-to-construct setup.

1 papers0 benchmarksImages, Videos

MVX (Multimodal V2X)

MVX incorporates realistic physical world simulation with a differentiable accurate ray tracing wireless simulation that includes multi-agent and multimodal datasets for AI-driven digital twin applications in vehicular communication systems.

1 papers1 benchmarksImages, LiDAR, Tabular, Videos

CausalChaos! (CausalChaos!QA)

CausalChaos! is a dataset for causal video question answering. It is based on Tom and Jerry cartoons. It features longer causal chains embedded in dynamic visual scenes. It also features challenging incorrect options, especially, Causal Confusion set which contains causally confounding incorrect options. All these factors prove to be challenging for current VLMs and other traditional Video Question Answering models.

1 papers0 benchmarksImages, RGB Video, Videos

SFU-HW-Objects-v1

The dataset SFU-HW-Objects-v1 contains bounding boxes and object class labels for High Efficiency Video Coding (HEVC) v1 Common Test Conditions (CTC) video sequences. The presented dataset contains only object labels; raw video sequences themselves can be obtained from the Joint Collaborative Team on Video Coding (JCT-VC). The dataset is used in the MPEG-VCM (Video Coding for Machines) and MPEG-FCM (Feature Coding for Machines) standardization efforts.

1 papers0 benchmarksVideos
PreviousPage 46 of 51Next