TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

UK Biobank Brain MRI (UK Biobank Data - Brain MRI)

UK Biobank participants have generously provided a very wide range of information about their health and well-being since recruitment began in 2006. This has been added to in the following ways: 

2 papers2 benchmarks3D, Images, Texts, Time series

Aria Digital Twin Dataset

A real-world dataset, with hyper-accurate digital counterpart & comprehensive ground-truth annotation.

2 papers6 benchmarks3D, 3d meshes, Point cloud, RGB Video, Videos

ValueConsistency

ValueConsistency is a dataset of both controversial and uncontroversial questions in English, Chinese, German, and Japanese for topics from the U.S., China, Germany, and Japan. It was generated via prompting by GPT-4 and validated manually.

2 papers0 benchmarks

Porto Taxi (Taxi Service Trajectory - Prediction Challenge, ECML PKDD 2015)

An accurate dataset describing trajectories performed by all the 442 taxis running in the city of Porto, in Portugal.

2 papers0 benchmarks

MECD (Multi-Event Causal Discovery)

Provide:

2 papers4 benchmarksTexts, Videos

TruthQuest

A benchmark for suppositional reasoning based on the principles of knights and knaves puzzles. Knights and knaves problems represent a classic genre of logical puzzles where characters either tell the truth or lie. The objective is to logically deduce each character's identity based on their statements. The challenge arises from the truth-telling or lying behavior, which influences the logical implications of each statement.

2 papers0 benchmarksTexts

RUFF

RUFF is a large-scale dataset to measure pronoun fidelity in English.

2 papers0 benchmarksTexts

Malicious URLs Dataset

Context Malicious URLs or malicious website is a very serious threat to cybersecurity. Malicious URLs host unsolicited content (spam, phishing, drive-by downloads, etc.) and lure unsuspecting users to become victims of scams (monetary loss, theft of private information, and malware installation), and cause losses of billions of dollars every year. We have collected this dataset to include a large number of examples of Malicious URLs so that a machine learning-based model can be developed to identify malicious urls so that we can stop them in advance before infecting computer system or spreading through inteinternet.

2 papers0 benchmarks

Terra Incognita

It is desirable for detection and classification algorithms to generalize to unfamiliar environments, but suitable benchmarks for quantitatively studying this phenomenon are not yet available. We present a dataset designed to measure recognition generalization to novel environments. The images in our dataset are harvested from twenty camera traps deployed to monitor animal populations. Camera traps are fixed at one location, hence the background changes little across images; capture is triggered automatically, hence there is no human bias. The challenge is learning recognition in a handful of locations, and generalizing animal detection and classification to new locations where no training data is available. In our experiments state-of-the-art algorithms show excellent performance when tested at the same location where they were trained. However, we find that generalization to new locations is poor, especially for classification systems.

2 papers0 benchmarks

DAPlankton

The DAPlankton dataset consists of over 110k expert-labeled plankton images. The data is divided into two subsets: DAPlankton_LAB and DAPlankton_SEA. DAPlankton_LAB consists of images captured from multiple mono-specific phytoplankton cultures, which were analysed using three different imaging instruments: Imaging FlowCytoBot (IFCB), CytoSense (CS) flow cytometer, and FlowCam (FC) imaging microscope each producing cropped images with one plankton particle in each. An expert further verified the class of each image, ensuring that there was no cross contamination between different cultures. This process resulted in a balanced dataset with negligible label uncertainty. DAPlankton_SEA consists of images captured from water samples collected from the Baltic Sea using two different imaging instruments: IFCB and CS. Each image was manually labeled by an expert. DAPlankton_SEA provides a realistic and more challenging dataset with a large class imbalance and natural intra-class variance.

2 papers0 benchmarksImages

TwinViews-13k

TwinViews-13k is a dataset of 13,855 pairs of left-leaning and right-leaning political statements, each pair matched by topic. It was created to study political bias in reward and language models, with a focus on understanding the interaction between model alignment to truthfulness and the emergence of political bias. The dataset was generated using GPT-3.5 Turbo, with extensive auditing to ensure ideological balance and topical relevance. This dataset can be used for various tasks related to political bias, natural language processing, and model alignment, particularly in studies examining how political orientation impacts model outputs.

2 papers0 benchmarksTexts

Twitter POS

K. Gimpel, N. Schneider, B. O’Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith, “Part-of-speech tagging for Twitter: Annotation, features, and experiments”, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2011, pp. 42–47.

2 papers1 benchmarks

C2A: Human Detection in Disaster Scenarios (Combination to Application)

C2A: Combination to Application Dataset Overview This repository contains the code and information for the paper "UAV-Enhanced Combination to Application: Comprehensive Analysis and Benchmarking of a Human Detection Dataset for Disaster Scenarios" by Ragib Amin Nihal, Benjamin Yen, Katsutoshi Itoyama, and Kazuhiro Nakadai.

2 papers5 benchmarksEnvironment, Images

COMPASS-XP

COMPASS-XP is a dataset of matched photographic and X-ray images of single objects, made available for use in Machine Learning & Computer Vision research, in particular in the context of transport security. Objects are imaged in multiple poses, and accompanied by metadata including labels for whether we consider the object to be dangerous in the context of aviation. Object classes overlap with those in the popular ImageNet Large Scale Visual Recognition Challenge class set and theWordNet lexical database, and identifiers for shared classes in both schemes are also provided.

2 papers0 benchmarksImages

2D site-percolation threshold (Daniel García Solla)

The dataset is a .h5 file comprised of entries with keys of the form (n,m), denoting the dimensions of the system matrix on which the simulations have been performed. The value of each key are two arrays, one to store the number of iterations needed to terminate the process for each simulation, and the other for the number of elements present at the terminal state of each simulation. Thus, given the maximum number of elements n*m in each system, the estimated percolation threshold can be computed by averaging the ratios between the elements at each terminal state and the system size. Overall, 207950010 simulations have been performed. And, this dataset was used to perform a complexity analysis on: https://arxiv.org/abs/2410.11874

2 papers0 benchmarksPhysics

INS Dataset

A significant challenge in removing shadows from indoor scenes is obtaining shadow-free images. To overcome this challenge, we propose a novel rendering pipeline for generating shadowed and shadow-free images under direct and indirect illumination, and create a comprehensive synthetic dataset that contains over 30,000 image pairs, covering various object types and lighting conditions.

2 papers4 benchmarksImages

NRHints-Synthetic (NRHints Synthetic Relighting Scenes)

A high-quality synthetic dataset for object relighting. Covering a wide range of geometry and material.

2 papers0 benchmarks3D, Images

NRHints-RealCapture (NRHints Real Captured Objects)

A high-quality captured dataset for object relighting. Covering a wide range of geometry and material.

2 papers0 benchmarks3D, Images

CII-Bench (Chinese Image Implication understanding Benchmark)

We introduce the Chinese Image Implication Understanding Benchmark CII-Bench, a new benchmark measuring the higher-order perceptual, reasoning and comprehension abilities of MLLMs when presented with complex Chinese implication images. These images, including abstract artworks, comics and posters, possess visual implications that require an understanding of visual details and reasoning ability. CII-Bench reveals whether current MLLMs, leveraging their inherent comprehension abilities, can accurately decode the metaphors embedded within the complex and abstract information presented in these images.

2 papers0 benchmarksImages, Texts

COFAR (Commonsense and Factual Reasoning in Image Search)

The COFAR (COmmonsense and FActual Reasoning) dataset is a collection of images and text queries specifically designed to challenge and evaluate image search models that aim to go beyond simple visual matching. It focuses on the ability of these models to perform commonsense and factual reasoning, a capability currently lacking in most existing image search technology.

2 papers2 benchmarksImages, Texts
PreviousPage 353 of 1000Next