TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

MIMIC-IV-ECG (MIMIC-IV-ECG: Diagnostic Electrocardiogram Matched Subset)

The MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sampled at 500 Hz. This subset contains all of the ECGs for patients who appear in the MIMIC-IV Clinical Database. When a cardiologist report is available for a given ECG, we provide the needed information to link the waveform to the report. The patients in MIMIC-IV-ECG have been matched against the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules.

3 papers0 benchmarksMedical

ERA5 (The 5th generation of ECMWF reanalysis data)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

3 papers0 benchmarks

StockEmotions

This repository contains a financial-domain-focused dataset for financial sentiment/emotion classification and stock market time series prediction. It's based on our paper: StockEmotions: Discover Investor Emotions for Financial Sentiment Analysis and Multivariate Time Series accepted by AAAI 2023 Bridge (AI for Financial Services).

3 papers0 benchmarksTexts, Time series

AROT-COV23

The AROT-COV23 (ARabic Original Tweets on COVID-19 as of 2023) dataset is a large-scale collection of original Arabic tweets related to COVID-19, spanning from January 2020 to January 2023, and the period for which we collected the data runs from January 1, 2020 to January 5, 2023. The dataset contains approximately 500,000 original tweets, providing a rich source of information on how Arabic-speaking Twitter users have discussed and shared information about the pandemic. For more details on this dataset, please see the paper in the citation section below.

3 papers0 benchmarksTexts

BUSTER (BUSiness Transaction Entity Recognition dataset.)

BUSiness Transaction Entity Recognition dataset.

3 papers0 benchmarksTexts

SICE-Mix

A test dataset SICE_Mix image datasets to represent complex mixed over-/under-exposed scenes.

3 papers3 benchmarks

SICE-Grad

A test dataset SICE_Grad image datasets to represent complex mixed over-/under-exposed scenes.

3 papers3 benchmarks

Sony-Total-Dark (SID Sony subset without gamma correction)

Original SID dataset is introduced in "Learning to See in the Dark". The subset of SID dataset captured by Sony α7S II camera is adopted for evaluation. There are 2697 short-long-exposure RAW image pairs. To make this dataset more challenging, we converted the RAW format images to sRGB images with no gamma correction, which resulted in images becoming extremely dark.

3 papers3 benchmarks

DangerousQA

DangerousQA refers to a set of harmful questions used to evaluate the safety and behavior of large language models (LLMs) in generating responses. In the context of the RED-EVAL safety benchmark, DangerousQA consists of 200 harmful questions collected from various sources, such as those related to racism, stereotypes, sexism, legality, toxicity, and harm. These questions are used to test the ability of LLMs to handle sensitive and potentially harmful content and to assess their performance in generating appropriate responses to such prompts.

3 papers0 benchmarks

FanOutQA

FanOutQA is a high quality, multi-hop, multi-document benchmark for large language models using English Wikipedia as its knowledge base. Compared to other question-answering benchmarks, FanOutQA requires reasoning over a greater number of documents, with the benchmark's main focus being on the titular fan-out style of question. We present these questions in three tasks -- closed-book, open-book, and evidence-provided -- which measure different abilities of LLM systems.

3 papers0 benchmarks

NicheHazardQA

Contains harmful questions across different topics

3 papers0 benchmarks

Human Protein Atlas

The Human Protein Atlas contains images of histological sections from normal and cancer tissues obtained by immunohistochemistry. Antibodies are labeled with DAB (3,3'-diaminobenzidine) and the resulting brown staining indicates where an antibody has bound to its corresponding antigen.

3 papers0 benchmarks

CelebA-Spoof-Enroll

CelebA-Spoof is a large-scale face anti-spoofing dataset recently introduced in [53]. The dataset contains 625,537 images of 10,177 celebrities captured under different spoof mediums, environments and illumination conditions. The original dataset proposes three different evaluation protocols. For our experimentation, we focus on the most general ”intra” protocol, in which different spoof types, environments and illumination conditions are used for both training and testing.

3 papers0 benchmarksImages

SiW-Enroll

SiW (Spoofing in the Wild) is a face anti-spoofing dataset recently introduced in [29] where images are extracted from short videos captured at high resolution and 30 frames per second. In total, 4,478 videos are collected from 165 subjects including variations in spoof type, recording device, illumination condition, pose and facial expression.

3 papers0 benchmarksImages

AV-MNIST (Audio Visual MNIST)

This is a simple audio-visual dataset artificially assembled from independent visual and audio datasets. The first modality corresponds to 28 × 28 MNIST images, with 75% of their energy removed by PCA. The audio modality is made of audio samples on which we have computed 112 × 112 spectrograms. The audio samples are 25,102 pronounced digits of the Tidigits database augmented by adding randomly chosen noise samples from the ESC-50 dataset. Contaminated audio samples are randomly paired, accordingly with labels, with MNIST digits in order to reach 55,000 pairs for training and 10,000 pairs for testing. For validation we take 5000 samples from the training set.

3 papers0 benchmarks

ImageNet-B

We introduce diverse and realistic backgrounds into the images or color, texture, and adversarial changes in the background

3 papers0 benchmarks

OAB Exams

The OAB Exams dataset is a valuable resource used in the context of legal information systems. In Brazil, all legal professionals must demonstrate their knowledge of the law and its application by passing the OAB exams, which are the national bar exams. These exams serve as a benchmark for evaluating the performance of legal information systems. If a system can achieve a level of legal reasoning comparable to that of a human lawyer who successfully passes the OAB exam, it indicates significant progress.

3 papers0 benchmarks

PT Hate Speech

The PT Hate Speech is a valuable resource for studying hate speech in the Portuguese language. Here are the key details about this dataset:

3 papers0 benchmarks

All-day CityScapes

We design an all-day semantic segmentation benchmark all-day CityScapes. It is the first semantic segmentation benchmark that contains samples from all-day scenarios, i.e., from dawn to night. Our dataset will be made publicly available at [https://isis-data.science.uva.nl/cv/1ADcityscape.zip].

3 papers1 benchmarksImages

AS-V2 (The All-Seeing Dataset v2)

We propose a novel task, termed Relation Conversation (ReC), which unifies the formulation of text generation, object localization, and relation comprehension. Based on the unified formulation, we construct the AS-V2 dataset, which consists of 127K high-quality relation conversation samples, to unlock the ReC capability for Multi-modal Large Language Models (MLLMs).

3 papers0 benchmarksImages, Texts
PreviousPage 289 of 1000Next