TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

SuperAnimal-TopViewMouse

Introduction This dataset supports Ye et al. 2024 Nature Communications (https://www.nature.com/articles/s41467-024-48792-2).

2 papers0 benchmarksImages

iRodent (iRodent Animal Pose Estimation)

Description: The "iRodent" dataset contains rodent species observations obtained using the iNaturalist API, with a focus on Suborder Myomorpha (Taxon ID: 16). The dataset features prominent rodent species like Muskrat, Brown Rat, House Mouse, Black Rat, Hispid Cotton Rat, Meadow Vole, Bank Vole, Deer Mouse, White-footed Mouse, and Striped Field Mouse. The dataset provides manually labeled keypoints for pose estimation and segmentation masks for a subset of images using a Mask R-CNN model.

2 papers2 benchmarksImages

EndoNeRF

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers0 benchmarks

The Well: 15TB of Physics Simulations

Large-scale collection of machine learning datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain scientists and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite for accelerating research in machine learning and computational sciences.

2 papers0 benchmarks

UW Indoor Scenes (UW-IS) Occluded dataset

UW Indoor Scenes (UW-IS) Occluded dataset is curated using commodity hardware (Intel RealSense D435) to reflect real world robotics scenarios. It consists of two completely different indoor environments. The first environment is a lounge where the objects are placed on a tabletop. The second environment is a mock warehouse setup where the objects are placed on a shelf. For each of these environments, we have RGB-D images from 36 videos comprising five to seven objects each, taken from distances up to approximately 2m. The videos cover two different lighting conditions, three different levels of object separation for three different object categories (i.e., kitchen objects, food items, and tools/miscellaneous). The first level of object separation is such that there is no object occlusion. The second level of object separation is such that some occlusion occurs, while the third level is where the objects are placed extremely close together. Overall, the dataset considers 20 object class

2 papers0 benchmarksImages

Street360Loc

Text-Vison Cross-Modal Place Recognition Dataset

2 papers0 benchmarksImages, Texts

RichHF-18K

We collect a dataset of Rich Human Feedback on 18K images (RichHF-18K), which contains (i) point annotations on the image that highlight regions of implausibility/artifacts, and text-image misalignment; (ii) labeled words on the prompts specifying the missing or misrepresented concepts in the generated image; and (iii) four types of fine-grained scores for image plausibility, text-image alignment, aesthetics, and overall rating.

2 papers0 benchmarksImages, Texts

PreBit: Multimodal dataset for Bitcoin price

This is the dataset accompanying the paper: "PreBit - A multimodal model with Twitter FinBERT embeddings for extreme price movement prediction of Bitcoin"

2 papers0 benchmarks

SOAR

https://github.com/rail-berkeley/soar?tab=readme-ov-file#using-soar-data

2 papers0 benchmarks

SUMS (Summit Vitals: Multi-Camera and Multi-Signal Biosensing at High Altitudes)

Here is SUMS dataset collected by Qinghai University. It is a multi-camera and multi-signal Biosensing dataset collected at high altitudes, which includes 80 synchronized non-contact facial and contact finger videos from 10 subjects during exercise and oxygen recovery scenarios. This dataset captures PPG, respiration rate (RR), and SpO2, and is designed to validate video vitals estimation algorithms and compare facial rPPG with finger cPPG. Our results demonstrate that fusing videos from different positions (face and finger) reduces the mean absolute error (MAE) of SpO2 predictions by 7.6% and 10.6% compared to using only face or only finger data. Additionally, training on multiple indicators such as PPG and blood oxygen simultaneously reduces SpO2 estimation MAE by 17.8%. We recruited ten participants living on the Qinghai Plateau to collect hypoxia data in a real high-altitude environment. Data collection utilized two Logitech C922 cameras to capture videos of participants’ faces and

2 papers0 benchmarks

RLAP (Remote Learning Affect and Physiologic dataset)

The Remote Learning Affect and Physiologic (RLAP) dataset is a dataset applied to remote learning affect and engagement, which contains learners' blood volume pulse (BVP) signals that are highly synchronized. This dataset is suitable for training neural rPPG algorithms.

2 papers0 benchmarks

VerilogEval

VerilogEval Dataset The VerilogEval Dataset is a benchmark specifically designed to assess the ability of large language models (LLMs) to generate syntactically correct and functionally accurate Verilog code. Introduced in the paper VerilogEval: Evaluating Large Language Models for Verilog Code Generation, it has become a cornerstone for research in hardware code generation.

2 papers1 benchmarksTexts

SolutionBench

https://huggingface.co/papers/2502.20730

2 papers0 benchmarks

Predictive Analytics for Retail Inventory Management

Problem Statement

2 papers0 benchmarks

Multi-omics mRNA, miRNA, and DNA Methylation Dataset

The dataset contains multi-omics data, incuding mRNA, miRNA, and DNA methylation. The dataset comprises 8,464 samples involving 2,794 omics features and covers 31 cancer types and normal tissues.

2 papers1 benchmarks

ACL 2023 Dataset

This is the dataset which contains the ' limitation' text from all papers of ACL 2023

2 papers0 benchmarks

LymphoMNIST

LymphoMNIST is a comprehensive dataset designed for the nuanced classification of lymphocyte images. It encompasses approximately 80,000 high-resolution 64x64 images, meticulously categorized into three primary classes: B cells, T4 cells, and T8 cells.​

2 papers0 benchmarksImages

GTA-Human II

This is the latest version of our datasets, and is built upon GTA-V for expressive human pose and shape estimation. It features multi-person scenes with SMPL-X annotations. In addition to color image sequences, 3D bounding boxes and cropped point clouds (generated from synthetic depth images) are also provided. Please contact Zhongang Cai (caiz0023@e.ntu.edu.sg) for feedback.

2 papers0 benchmarks

LongVALE

Despite impressive advancements in video understanding, most efforts remain limited to coarse-grained or visual-only video tasks. However, real-world videos encompass omni-modal information (vision, audio, and speech) with a series of events forming a cohesive storyline. The lack of multi-modal video data with fine-grained event annotations and the high cost of manual labeling are major obstacles to comprehensive omni-modality video perception. To address this gap, we propose an automatic pipeline consisting of high-quality multi-modal video filtering, semantically coherent omni-modal event boundary detection, and cross-modal correlation-aware event captioning. In this way, we present LongVALE, the first-ever Vision-Audio-Language Event understanding benchmark comprising 105K omni-modal events with precise temporal boundaries and detailed relation-aware captions within 8.4K high-quality long videos. Further, we build a baseline that leverages LongVALE to enable video large language mod

2 papers0 benchmarksAudio, Speech, Texts, Videos

DropletVideo-10M

DropletVideo is a project exploring high-order spatio-temporal consistency in image-to-video generation. It is trained on DropletVideo-10M. The model supports multi-resolution inputs, dynamic FPS control for motion intensity, and demonstrates potential for 3D consistency. The model supports multi-resolution inputs, dynamic FPS control for motion intensity, and demonstrates potential for 3D consistency. For further details, you can check our project page as well as the technical report.

2 papers0 benchmarksVideos
PreviousPage 358 of 1000Next