TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

383 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

383 dataset results

WikiScenes

The WikiScenes dataset consists of paired images and language descriptions capturing world landmarks and cultural sites, with associated 3D models and camera poses. WikiScenes is derived from the massive public catalog of freely-licensed crowdsourced data in the Wikimedia Commons project, which contains a large variety of images with captions and other metadata.

3 papers0 benchmarks3D, Images, Texts

MVHand

MVHand is a new multi-view hand posture dataset to obtain complete 3D point clouds of the hand in the real world.

3 papers0 benchmarks3D, Point cloud

HOPE-Image (Household Objects for Pose Estimation)

The NVIDIA HOPE datasets consist of RGBD images and video sequences with labeled 6-DoF poses for 28 toy grocery objects. The toy grocery objects are readily available for purchase and have ideal size and weight for robotic manipulation. 3D textured meshes for generating synthetic training data are provided.

3 papers0 benchmarks3D, 3d meshes, Images

HOPE-Video (Household Objects for Pose Estimation)

The HOPE-Video dataset contains 10 video sequences (2038 frames) with 5-20 objects on a tabletop scene captured by a robot arm-mounted RealSense D415 RGBD camera. In each sequence, the camera is moved to capture multiple views of a set of objects in the robotic workspace. First COLMAP was applied to refine the camera poses (keyframes at 6~fps) provided by forward kinematics and RGB calibration from RealSense to Baxter's wrist camera. 3D dense point cloud was then generated via CascadeStereo (included for each sequence in 'scene.ply'). Ground truth poses for the HOPE objects models in the world coordinate system were annotated manually using the CascadeStereo point clouds. The following are provided for each frame:

3 papers0 benchmarks3D, 3d meshes, Videos

ADHD-200

Attention Deficit Hyperactivity Disorder (ADHD) affects at least 5-10% of school-age children and is associated with substantial lifelong impairment, with annual direct costs exceeding $36 billion/year in the US. Despite a voluminous empirical literature, the scientific community remains without a comprehensive model of the pathophysiology of ADHD. Further, the clinical community remains without objective biological tools capable of informing the diagnosis of ADHD for an individual or guiding clinicians in their decision-making regarding treatment.

3 papers0 benchmarks3D, Medical, fMRI

Florence 4D

Florence 4D is a dataset that consists of dynamic sequences of 3D face models, where a combination of synthetic and real identities exhibit an unprecedented variety of 4D facial expressions, with variations that include the classical neutral-apex transition, but generalize to expression-to-expression. It is designed for research in 4D facial analysis, with a particular focus on dynamic expressions.

3 papers0 benchmarks3D

DOORS (Dataset fOr bOuldeRs Segmentation)

DOORS is a dataset designed for boulders recognition, centroid regression, segmentation, and navigation applications. The dataset is divided into two sets:

3 papers0 benchmarks3D, Images

CHAIRS dataset

CHAIRS is a large-scale motion-captured f-AHOI dataset, consisting of 17.3 hours of versatile interactions between 46 participants and 81 articulated and rigid sittable objects. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process, as well as realistic and physically plausible full-body interactions.

3 papers0 benchmarks3D

3D-POP

The dataset is designed specifically to solve a range of computer vision problems (2D-3D tracking, posture) faced by biologists while designing behavior studies with animals.

3 papers0 benchmarks3D, Biology, Images, RGB Video, Stereo, Tracking, Videos

EgoPAT3D-DT

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

3 papers0 benchmarks3D, Videos

DAD-3DHeads (DAD-3DHeads dataset)

DAD-3DHeads dataset consists of 44,898 images collected from various sources (37,840 in the training set, 4,312 in the validation set, and 2,746 in the test set).

3 papers0 benchmarks3D, 3d meshes, Images

OpenTrench3D

OpenTrench3D, the first publicly available point cloud dataset of underground utilities from open trenches. It features 310 fully annotated point clouds consisting of a total of 528 million points categorised into 5 unique classes. OpenTrench3D consists of photogrammetrically derived 3D point clouds capturing detailed scenes of open trenches, revealing underground utilities.

3 papers9 benchmarks3D, Point cloud

VBR (VBR: A Vision Benchmark in Rome)

This dataset presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divi

3 papers0 benchmarks3D, LiDAR, Point cloud, RGB Video, Stereo, Tracking

Mono3DRefer

We sample 2025 frames of images from the original KITTI for Mono3DRefer, containing 41,140 expressions in total and a vocabulary of 5,271 words.

3 papers0 benchmarks3D, Images, Texts

ARCH2S (Dataset, Benchmark for Learning Exterior Architectural Structures from Point Clouds)

Precise segmentation of architectural structures provides detailed information about various building components, enhancing our understanding and interaction with our built environment. Nevertheless, existing outdoor 3D point cloud datasets have limited and detailed annotations on architectural exteriors due to privacy concerns and the expensive costs of data acquisition and annotation. To overcome this shortfall, this paper introduces a semantically-enriched, photo-realistic 3D architectural models dataset and benchmark for semantic segmentation. It features 4 different building purposes of real-world buildings as well as an open architectural landscape in Hong Kong. Each point cloud is annotated into one of 14 semantic classes.

3 papers2 benchmarks3D, Environment, Point cloud

AnoVox

AnoVox is a large-scale benchmark for ANOmaly detection in autonomous driving. AnoVox incorporates multimodal sensor data and spatial VOXel ground truth, allowing for the comparison of methods independent of their used sensor. AnoVox contains both content and temporal anomalies.

3 papers0 benchmarks3D, Images, LiDAR, RGB-D

GarmentCodeData (GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns)

GarmentCodeData contains 115,000 data points that cover a variety of designs in many common garment categories: tops, shirts, dresses, jumpsuits, skirts, pants, etc., fitted to a variety of body shapes sampled from a custom statistical body model based on CAESAR, as well as a standard reference body shape, applying three different textile materials.

3 papers0 benchmarks3D, 3d meshes

AeroPath (AeroPath: An airway segmentation benchmark dataset with challenging pathology)

Public benchmark dataset (AeroPath), consisting of 27 CT images from patients with pathologies ranging from emphysema to large tumors, with corresponding trachea and bronchi annotations.

3 papers0 benchmarks3D, Medical

MM-OR

Operating rooms (ORs) are complex, high-stakes environments requiring precise understanding of interactions among medical staff, tools, and equipment for enhancing surgical assistance, situational awareness, and patient safety. Current datasets fall short in scale, realism and do not capture the multimodal nature of OR scenes, limiting progress in OR modeling. To this end, we introduce MM-OR, a realistic and large-scale multimodal spatiotemporal OR dataset, and the first dataset to enable multimodal scene graph generation. MM-OR captures comprehensive OR scenes containing RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data and is annotated with panoptic segmentations, semantic scene graphs, and downstream task labels. Further, we propose MM2SG, the first multimodal large vision-language model for scene graph generation, and through extensive experiments, demonstrate its ability to effectively leverage multimodal inputs. Together, MM-OR and MM2SG establi

3 papers7 benchmarks3D, Audio, Graphs, Images, Medical, Point cloud, RGB-D, Speech, Texts, Time series, Videos

Rendered Handpose Dataset

Rendered Handpose Dataset contains 41258 training and 2728 testing samples. Each sample provides:

2 papers0 benchmarks3D, Images, RGB-D
PreviousPage 11 of 20Next