TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

IllusionVQA

IllusionVQA is a Visual Question Answering (VQA) dataset with two sub-tasks. The first task tests comprehension on 435 instances in 12 optical illusion categories. Each instance consists of an image with an optical illusion, a question, and 3 to 6 options, one of which is the correct answer. We refer to this task as Logo IllusionVQA-Comprehension. The second task tests how well VLMs can differentiate geometrically impossible objects from ordinary objects when two objects are presented side by side. The task consists of 1000 instances following a similar format to the first task. We refer to this task as Logo IllusionVQA-Soft-Localization.

3 papers2 benchmarksImages, Texts

xMIND (A Multilingual Dataset for Cross-lingual News Recommendation)

xMIND is an open, large-scale multilingual news dataset for multi- and cross-lingual news recommendation. xMIND is derived from the English MIND dataset using open-source neural machine translation (i.e., NLLB 3.3B).

3 papers0 benchmarksTexts

DevBench

DevBench is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) across various stages of the software development lifecycle. It covers critical steps such as software design, environment setup, implementation, acceptance testing, and unit testing. By integrating these interconnected tasks under a single framework, DevBench offers a holistic perspective on the potential of LLMs for automated software development1.

3 papers0 benchmarks

SeaEval

SeaEval is a benchmark designed for evaluating multilingual foundation models (FMs). These large language models (LLMs) have demonstrated impressive generalizability and adaptability across various downstream tasks. The SeaEval benchmark goes beyond standard accuracy metrics and investigates how well these models understand and reason with natural language, as well as their comprehension of cultural practices, nuances, and values¹.

3 papers0 benchmarks

Diffusion Deepfake

Human face Deepfake dataset sampled from large datasets

3 papers0 benchmarks

DAD-3DHeads (DAD-3DHeads dataset)

DAD-3DHeads dataset consists of 44,898 images collected from various sources (37,840 in the training set, 4,312 in the validation set, and 2,746 in the test set).

3 papers0 benchmarks3D, 3d meshes, Images

MosquitoFusion

The dataset, comprising 1204 meticulously curated images, serves as a comprehensive resource for advancing real-time mosquito detection models. The dataset is strategically divided into training, validation, and test sets, accounting for 87%, 8%, and 5% of the images, respectively. A rigorous preprocessing phase involves auto-orientation and resizing to standardize dimensions at 640x640 pixels. To ensure dataset integrity, the filter null criterion mandates that all images must contain annotations. Augmentations, including flips, rotations, crops, and grayscale applications, enhance the dataset's diversity, fostering robust model training. With a focus on quality and variety, this dataset provides a solid foundation for evaluating and enhancing real-time mosquito detection models.

3 papers0 benchmarksImages

UIT-ViCoV19QA

The dataset comprises 4,500 question-answer pairs collected from trusted medical sources, with at least one answer and at most four unique paraphrased answers per question

3 papers0 benchmarksTexts

Zhou2016 MOABB (Motor Imagery dataset from Zhou et al 2016.)

3 papers16 benchmarks

BNCI2014-001 MOABB (BNCI 2014-001 Motor Imagery dataset.)

3 papers16 benchmarks

BNCI2014-002 MOABB (BNCI 2014-002 Motor Imagery dataset.)

3 papers12 benchmarks

Lee2019-ERP MOABB (BMI/OpenBMI dataset for P300.)

3 papers9 benchmarks

Lee2019-MI MOABB (BMI/OpenBMI dataset for MI.)

3 papers12 benchmarks

Lee2019-SSVEP MOABB (BMI/OpenBMI dataset for SSVEP.)

3 papers9 benchmarks

OpenTrench3D

OpenTrench3D, the first publicly available point cloud dataset of underground utilities from open trenches. It features 310 fully annotated point clouds consisting of a total of 528 million points categorised into 5 unique classes. OpenTrench3D consists of photogrammetrically derived 3D point clouds capturing detailed scenes of open trenches, revealing underground utilities.

3 papers9 benchmarks3D, Point cloud

MMCode

MMCode is a multi-modal code generation dataset designed to evaluate the problem-solving skills of code language models in visually rich contexts (i.e. images). It contains 3,548 questions paired with 6,620 images, derived from real-world programming challenges across 10 code competition websites, with Python solutions and tests provided. The dataset emphasizes the extreme demand for reasoning abilities, the interwoven nature of textual and visual contents, and the occurrence of questions containing multiple images.

3 papers0 benchmarksImages, Tables, Texts

Visual Writing Prompts

Hugging Face Datasets (New!) | Website | Github Repository | arXiv e-Print

3 papers0 benchmarksImages, Texts

STEM

This dataset is proposed in the ICLR 2024 paper: Measuring Vision-Language STEM Skills of Neural Models. The problems in the real world often require solutions, combining knowledge from STEM (science, technology, engineering, and math). Unlike existing datasets, our dataset requires the understanding of multimodal vision-language information of STEM. Our dataset features one of the largest and most comprehensive datasets for the challenge. It includes 448 skills and 1,073,146 questions spanning all STEM subjects. Compared to existing datasets that often focus on examining expert-level ability, our dataset includes fundamental skills and questions designed based on the K-12 curriculum. We also add state-of-the-art foundation models such as CLIP and GPT-3.5-Turbo to our benchmark.

3 papers0 benchmarksImages, Texts

Xhate999

We present XHate-999, a multi-domain and multilingual evaluation data set for abusive language detection. By aligning test instances across six typologically diverse languages, XHate-999 for the first time allows for disentanglement of the domain transfer and language transfer effects in abusive language detection. We conduct a series of domain- and language-transfer experiments with state-of-the-art monolingual and multilingual transformer models, setting strong baseline results and profiling XHate-999 as a comprehensive evaluation resource for abusive language detection. Finally, we show that domain- and language-adaption, via intermediate masked language modeling on abusive corpora in the target language, can lead to substantially improved abusive language detection in the target language in the zero-shot transfer setups.

3 papers0 benchmarksTexts

SoccerNet-GSR (SoccerNet Game State Reconstruction)

The SoccerNet Game State Reconstruction task is a novel high level computer vision task that is specific to sports analytics. It aims at recognizing the state of a sport game, i.e., identifying and localizing all sports individuals (players, referees, ..) on the field based on a raw input videos. SoccerNet-GSR is composed of 200 video sequences of 30 seconds, annotated with 9.37 million line points for pitch localization and camera calibration, as well as over 2.36 million athlete positions on the pitch with their respective role, team, and jersey number.

3 papers0 benchmarksRGB Video, Tracking, Videos
PreviousPage 290 of 1000Next