TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

1,019 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

1,019 dataset results

WhyAct

WhyAct is a dataset for identifying human action reasons in online videos, consisting of 1,077 visual actions manually annotated with their reasons.

1 papers0 benchmarksVideos

HYouTube

HYouTube is a video for Video harmonization, which aims to adjust the foreground of a composite video to make it compatible with the background. The dataset was created by adjusting the foreground of real videos to create synthetic composite videos. It is based on Youtube-VOS

1 papers0 benchmarksVideos

METEOR

METEOR is a complex traffic dataset which captures traffic patterns in unstructured scenarios in India. METEOR consists of more than 1000 one-minute video clips, over 2 million annotated frames with ego-vehicle trajectories, and more than 13 million bounding boxes for surrounding vehicles or traffic agents. METEOR is a unique dataset in terms of capturing the heterogeneity of microscopic and macroscopic traffic characteristics.

1 papers0 benchmarksVideos

MOD20

MOD20 is an action recognition dataset consisting of videos collected from YouTube and our own drone. The dataset contains 2,324 videos lasting a total of 240 minutes. The actions were selected from challenging and complex scenarios, and cover multiple viewpoints, from ground-level to bird's-eye view. The substantial variation in body size, number of people, viewpoints, camera motion, and background makes this dataset challenging for action recognition. The action classes, 720×720 size un-distorted clips and multi-viewpoint video selection extend the dataset's applicability to a wider research community.

1 papers0 benchmarksVideos

Surgical Hands

Surgical Hands is a dataset that provides multi-instance articulated hand pose annotations for in-vivo videos. The dataset contains 76 video clips from 28 publicly available surgical videos and over 8.1k annotated hand pose instances.

1 papers0 benchmarksVideos

Mouse Grooming Behavior

This dataset was generated to characterize mouse grooming behavior. Mouse grooming serves many adaptive functions such as coat and body care, stress reduction, de-arousal, social functions, thermoregulation, nociception, as well as other functions. Alteration of this behavior is measured and used for mouse pre-clinical models of human psychiatric illnesses.

1 papers0 benchmarksVideos

DriverMHG

Driver Micro Hand Gestures (DriverMHG) is a dataset for dynamic recognition of driver micro hand gestures, which consists of RGB, depth and infrared modalities.

1 papers0 benchmarksVideos

A Datacube for the analysis of wildfires in Greece

This dataset is meant to be used to develop models for next-day fire hazard forecasting in Greece. It contains data from 2009 to 2020 at a 1km x 1km x 1 daily grid.

1 papers0 benchmarksEnvironment, Videos

THGP (Temporal Hands Guns and Phones Dataset)

Temporal Hands Guns and Phones (THGP) dataset, is a collection of 5960 video frames (5000 for training and 960 for testing). The training part is composed with 50 videos of 100 frames (720 × 720 pixels). This dataset contains 20 videos of shooting drills, 20 videos of armed robberies, and 10 videos of people making calls. The testing part contains 48 videos of 20 frames (720 × 720). Videos contained in the testing dataset includes phone calls, gun reviews, shooting drills, people making calls, and armed robberies at convenience stores. This dataset is labeled with the bounding boxes of hands, phones, and guns.

1 papers0 benchmarksVideos

LIRIS human activities dataset

The LIRIS human activities dataset contains (gray/rgb/depth) videos showing people performing various activities taken from daily life (discussing, telphone calls, giving an item etc.). The dataset is fully annotated, where the annotation not only contains information on the action class but also its spatial and temporal positions in the video. It was originally shot for the ICPR-HARL 2012 competition.

1 papers0 benchmarksVideos

GIF Reply Dataset

The released GIF Reply dataset contains 1,562,701 real text-GIF conversation turns on Twitter. In these conversations, 115,586 unique GIFs are used. Metadata, including OCR extracted text, annotated tags, and object names, are also available for some GIFs in this dataset.

1 papers1 benchmarksImages, Texts, Videos

MFA (Many Faces of Anger)

The MFA (Many Faces of Anger) dataset includes 200 in-the-wild videos from North American and Persian cultures with fine-grained labels of: 'annoyed', 'anger', 'disgust', 'hatred' and 'furious' and 13 related emojis.

1 papers18 benchmarksVideos

HT1080WT cells - 3D collagen type I matrices (HT1080WT cells embedded in 3D collagen type I matrices - manual annotations for cell instance segmentation and tracking)

Human fibrosarcoma HT1080WT (ATCC) cells at low cell densities embedded in 3D collagen type I matrices [1]. The time-lapse videos were recorded every 2 minutes for 16.7 hours and covered a field of view of 1002 pixels × 1004 pixels with a pixel size of 0.802 μm/pixel The videos were pre-processed to correct frame-to-frame drift artifacts, resulting in a final size of 983 pixels × 985 pixels pixels.

1 papers0 benchmarksMedical, Videos

LTFT (Long-Term Face Tracking)

Dataset originally conceived for multi-face tracking/detection for highly crowded scenarios. In these scenarios, the face is the only part that can be used to track the individuals.

1 papers0 benchmarksVideos

ASLLVD (American Sign Language Lexicon Video Dataset)

Extremely important: The ASLLVD video data are subject to Terms of Use: http://www.bu.edu/asllrp/signbank-terms.pdf. By downloading these video files, you are agreeing to respect these conditions. In particular, NO FURTHER REDISTRIBUTION OF THESE VIDEO FILES is allowed.

1 papers0 benchmarksVideos

SFU-HW-Tracks

SFU-HW-Tracks is a dataset for Object Tracking on raw video sequences that contains object annotations with unique object identities (IDs) for the High Efficiency Video Coding (HEVC) v1 Common Test Conditions (CTC) sequences. It is the tracking extension of the dataset called SFU-HW-Objects-v1.

1 papers0 benchmarksVideos

MedVidCL (Medical Video Classification)

The MedVidCL dataset contains a collection of 6, 617 videos annotated into ‘medical instructional’, ‘medical non-instructional' and ‘non-medical’ classes. A two-step approach is used to construct the MedVidCL dataset. In the first step, the videos annotated by health informatics experts are used to train a machine learning model that predicts the given video to one of the three aforementioned classes. In the second step, only the high-confidence videos are used and health informatics experts assess the model’s predicted video category and update the category wherever needed.

1 papers0 benchmarksMedical, Texts, Videos

Drone vs Bird (Drone vs Bird Detection Challenge)

For the Drone-vs-Bird Detection Challenge 2021, 77 different video sequences have been made available as training data. These video sequences originate from the previous installment of the challenge and were collected using MPEG4-coded static cameras by the SafeShore project, by the Fraunhofer IOSB research institute and by the ALADDIN2 project. On average, the video sequences consist of 1,384 frames, while each frame contains 1.12 annotated drones. The video sequences are recorded with both static cameras and moving cameras and the resolution varies between 720×576 and 3840×2160 pixels. In total, 8 different types of drones exist in the dataset , i.e. 3 with fixed wings and 5 rotary ones. For each video, a separate annotation file is provided, which contains the frame number and the bounding box (expressed as [topx topy width height]) for the frames in which drones enter the scenes.

1 papers20 benchmarksImages, Videos

Extended heartSeg

The dataset X of this work is an extension of the heartSeg dataset. Each sample x ∈ X is an RGB image capturing the heart region of Medaka (Oryzias latipes) hatchlings from a constant ventral view. Since the body of Medaka is see-through, noninvasive studies regarding the internal organs and the whole circulatory system are practicable. A Medaka’s heart contains three parts: the atrium, the ventricle, and the bulbus. The atrium receives deoxygenated blood from the circulatory system and delivers it to the ventricle, which forwards it into the bulbus. The bulbus is the heart’s exit chamber and provides the gill arches with a constant blood flow. The blood flow through these three chambers was captured in 63 short recordings (around 11 seconds with 24 frames per second each) in total, from which the single image samples x ∈ X are extracted. The dataset is split into training and test data following the heartSeg dataset with ntrain = 565 samples in the training set Xtrain and ntest = 165

1 papers1 benchmarksBiology, Biomedical, Medical, Videos

Natural Sprites

This csv consists of (x-position, y-position, area) tuples of three views (left, middle, right) of downscaled binary masks with aspect ratio kept (64 x 128) from the 2019 YouTube-VIS challenge, which can be found at https://competitions.codalab.org/competitions/20127#participate-get-data. Extracting pairs from this csv results in 234,652 transitions in the given statistics. These statistics can be used to augment ground truth factor distributions with natural transitions, which we demonstrate with spriteworld. For details, we refer to our paper, which can be found at https://openreview.net/forum?id=EbIDjBynYJ8.

1 papers1 benchmarksImages, Videos
PreviousPage 40 of 51Next