TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

KU-HAR

Human Activity Recognition (HAR) refers to the capacity of machines to perceive human actions. This dataset contains information on 18 different activities collected from 90 participants (75 male and 15 female) using smartphone sensors (Accelerometer and Gyroscope). It has 1945 raw activity samples collected directly from the participants, and 20750 subsamples extracted from them. The activities are:

4 papers0 benchmarksTabular, Time series

HateMM

Hate speech has become one of the most significant issues in modern society, with implications in both the online and offline worlds. However, most of the work has primarily focused on text media, with relatively little work on images and even less on videos. Thus, early-stage automated video moderation techniques are needed to handle the videos that are being uploaded to keep the platform safe and healthy. Therefore, we curated approximately ~43 hours of videos from BitChute and manually annotated them as hate or non-hate, along with the frame spans that could explain the labeling decision.

4 papers2 benchmarksAudio, Videos

ECG-Image-Database (Digitization and Classification of ECG Images: The George B. Moody PhysioNet Challenge 2024)

The George B. Moody PhysioNet Challenges are annual competitions that invite participants to develop automated approaches for addressing important physiological and clinical problems. The 2024 Challenge invites teams to develop algorithms for digitizing and classifying electrocardiograms (ECGs) captured from images or paper printouts. Despite the recent advances in digital ECG devices, physical or paper ECGs remain common, especially in the Global South. These physical ECGs document the history and diversity of cardiovascular diseases (CVDs), and algorithms that can digitize and classify these images have the potential to improve our understanding and treatment of CVDs, especially for underrepresented and underserved populations.

4 papers1 benchmarksTime series

KodCode-V1 (KodCode/KodCode-V1)

KodCode is the largest fully-synthetic open-source dataset providing verifiable solutions and tests for coding tasks. It contains 12 distinct subsets spanning various domains (from algorithmic to package-specific knowledge) and difficulty levels (from basic coding exercises to interview and competitive programming challenges). KodCode is designed for both supervised fine-tuning (SFT) and RL tuning.

4 papers0 benchmarks

LeetCode-Hard

LeetCode-Hard is a benchmark dataset for code generation, consisting of 40 challenging LeetCode "hard-level" questions across 19 programming languages. It is designed to evaluate the problem-solving and functional correctness capabilities of large language models (LLMs), particularly in handling complex algorithmic tasks. This dataset was used to assess the Reflexion framework, which leverages verbal reinforcement learning to improve LLM performance on difficult coding problems.

4 papers0 benchmarks

Math500

math 500

4 papers0 benchmarksTexts

100style

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

4 papers0 benchmarksGraphs

HierarCaps

Images with paired ground-truth caption hierarchies

4 papers0 benchmarksImages, Texts

FEWS (FEWS: Large-Scale, Low-Shot Word Sense Disambiguation with the Dictionary)

FEWS (Few-shot Examples of Word Senses) is a few-shot dataset for English Word Sense Disambiguation (WSD) gathered from Wiktionary, an online, crowd-sourced dictionary. FEWS contains over 121,000 labeled examples of ambigous words, corresponding to more than 71,000 sense types. The evaluation for FEWS is split into few-shot and zero-shot settings, to better faciliate evaluating on few-shot learning and perfromance on rare senses.

4 papers4 benchmarks

Nordland* (2760 queries)

The nordland used in SALAD and BoQ (2760 queries, 27592 reference images, threshold: 1 frames).

4 papers3 benchmarksImages

M$^3$-VOS (M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation)

💡 Description A new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation (M$^3$-VOS), to verify the ability of models to understand object phases, which consists of 479 high-resolution videos spanning over 10 distinct everyday scenarios. We collected 205,181 masks, with an average track duration of 14.27s. M$^3$-VOS covers 120+ categories of objects across 6 phases within 14 scenarios, encompassing 23 specific phase transitions.

4 papers2 benchmarksImages, Texts, Videos

ImgEdit

ImgEdit is a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks.

4 papers0 benchmarksImages, Texts

Fine-Grained Cloud Segmentation Dataset

The dataset consists of 96 terrain-corrected (Level-1T) scenes from Landsat 8 OLI and TIRS, covering diverse biomes. This variety supports cloud detection and removal in complex environments. The dataset includes manually generated cloud masks with pixel-level annotations for cloud shadow, clear sky, thin clouds, and cloud areas. Each scene is cropped into 512×512 pixel patches and split into training, validation, and test sets (6:2:2 ratio). It is a valuable resource for training and evaluating fine-grained cloud segmentation models across various terrains.

4 papers2 benchmarks

CUFS (CUHK Face Sketch Database)

CUHK Face Sketch database (CUFS) is for research on face sketch synthesis and face sketch recognition. It includes 188 faces from the Chinese University of Hong Kong (CUHK) student database, 123 faces from the AR database 1, and 295 faces from the XM2VTS database 2. There are 606 faces in total. For each face, there is a sketch drawn by an artist based on a photo taken in a frontal pose, under normal lighting condition, and with a neutral expression.

3 papers24 benchmarksImages

Tsinghua-Tencent 100K (Traffic-Sign Detection and Classification in the Wild)

Although promising results have been achieved in the areas of traffic-sign detection and classification, few works have provided simultaneous solutions to these two tasks for realistic real world images. We make two contributions to this problem. Firstly, we have created a large traffic-sign benchmark from 100000 Tencent Street View panoramas, going beyond previous benchmarks. We call this benchmark Tsinghua-Tencent 100K. It provides 100000 images containing 30000 traffic-sign instances. These images cover large variations in illuminance and weather conditions. Each traffic-sign in the benchmark is annotated with a class label, its bounding box and pixel mask. Secondly, we demonstrate how a robust end-to-end convolutional neural network (CNN) can simultaneously detect and classify traffic-signs. Most previous CNN image processing solutions target objects that occupy a large proportion of an image, and such networks do not work well for target objects occupying only a small fraction of

3 papers3 benchmarksImages

ACL Title and Abstract Dataset

This dataset gathers 10,874 title and abstract pairs from the ACL Anthology Network (until 2016).

3 papers4 benchmarksTexts

OQMD v1.2 (The Open Quantum Materials Database)

The OQMD is a database of DFT calculated thermodynamic and structural properties of one million materials, created in Chris Wolverton's group at Northwestern University.

3 papers2 benchmarksGraphs

Cluttered Omniglot

Dataset for one-shot segmentation.

3 papers9 benchmarks

FSNS - Test

Arabic handwriting dataset.

3 papers1 benchmarks

LeNER-Br

LeNER-Br is a dataset for named entity recognition (NER) in Brazilian Legal Text.

3 papers2 benchmarksTexts
PreviousPage 257 of 1000Next