TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

MagicBathyNet

MagicBathyNet is a benchmark dataset made up of image patches of Sentinel-2, SPOT-6 and aerial imagery, bathymetry in raster format and seabed classes annotations. Dataset also facilitates unsupervised learning for model pre-training in shallow coastal areas.

2 papers0 benchmarksImages, LiDAR

DLP (Dragon Lake Park)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers0 benchmarks

TBSI Sunwoda Battery Dataset

AI for science has generated a great deal of enthusiasm from both academia and industry. The field of battery energy storage is no exception due to its cross-cutting properties of materials, chemistry, physics and electrical engineering. Due to the complexity and uncertainty of the manufacturing process, there persistently exists a considerable mismatch in performance between a manufactured battery and its counterpart from material laboratory, leading to compromised product quality, R&D efficiency, investment cost and lifetime sustainability. Sunwoda Electronic Co., Ltd, generates the TBSI Sunwoda Battery Dataset to verify the performance of novel battery material composition designs. The collaboration team at Tsinghua Berkeley Shenzhen Institute (TBSI) performs the main research work by providing an efficient and reliable early battery prototype verification methodology. We open-source this dataset to inspire more diversified data-driven, physics-informed battery management research a

2 papers0 benchmarks

TMD (Text-Music-Dance)

The Text-Music-Dance (TMD) dataset establishes a pioneering benchmark comprising 2,153 text-music-motion pairs. Dance motions and corresponding text annotations are sourced from Motion-X, incorporating AIST++ and other datasets. For motion-text pairs lacking music, corresponding music is generated using Stable Audio Open with beat adjustment and validated through expert assessments, ensuring inter-rater reliability.

2 papers16 benchmarks

mango

Large language models such as ChatGPT and GPT-4 have recently achieved astonishing performance on a variety of natural language processing tasks. In this paper, we propose MANGO, a benchmark to evaluate their ability to perform text-based mapping and navigation. Our benchmark includes $53$ mazes taken from a suite of textgames: each maze is paired with a walkthrough that visits every location but does not cover all possible paths. The task is question-answering: for each maze, a large language model reads the walkthrough and answers hundreds of mapping and navigation questions such as "How should you go to Attic from West of House?" and "Where are we if we go north and east from Cellar?". Although these questions are easy for humans, it turns out that even GPT-4, the best-to-date language model, performs poorly when answering them. Further, our experiments suggest that a strong mapping and navigation ability would benefit the performance of large language models on relevant downstre

2 papers0 benchmarksTexts

im2latexv2

The dataset is an enhanced version of the im2latex-100k dataset. It uses a novel LaTeX normalization process and 61 rendering environments to make the dataset more realistic.

2 papers0 benchmarks

The ULS23 Challenge Test Set

The ULS23 test set contains 725 lesions from 284 patients of the Radboudumc and JBZ hospitals in the Netherlands. It is intended to be used to measure the performance of 3D universal lesion segmentation models for Computed Tomography (CT). To prepare the data, radiological reports from both participating institutions where searched using NLP tools identifying patients with measurable target lesions, indicating that these lesions were clinically relevant. A random sample of patients was selected, 56.3% of which were male and with diverse scanner manufacturers. The lesions were annotated in 3D by expert radiologists with over 10 years of experience in reading oncological scans. ULS23 is an open benchmark, and we invite ongoing submissions to advance the development of future ULS models.

2 papers8 benchmarks3D, Images, Medical

National Lung Screening Trial (NLST)

The National Lung Screening Trial (NLST) was a randomized controlled trial conducted by the Lung Screening Study group (LSS) and the American College of Radiology Imaging Network (ACRIN) to determine whether screening for lung cancer with low-dose helical computed tomography (CT) reduces mortality from lung cancer in high-risk individuals relative to screening with chest radiography. Approximately 54,000 participants were enrolled between August 2002 and April 2004. Data collection has ended, and information is complete through December 31, 2009. NLST has the ClinicalTrials.gov registration number NCT00047385.

2 papers1 benchmarks3D, Medical

CODA

A novel road corner case dataset for object detection in autonomous driving which contains ~10000 carefully selected road driving scenes with high-quality bounding box annotation for 43 representative road object categories.

2 papers0 benchmarks

Calandra Dataset

The Calandra dataset provides the data from a pair of tactile sensors attached to a jaw gripper (left and right) alongside the RGB images. A triplet of samples was captured ’before’, ’during’, and ’after’ grasping a plethora of objects. The objective is to determine the success or the failure of the grasp attempt.

2 papers0 benchmarksImages

Human Simulacra

Human Simulacra is a virtual character dataset that contains 129k texts across 11 virtual characters, with each character having unique attributes, biographies, and stories.

2 papers0 benchmarksTexts

Deep Blending

The Deep Blending Dataset comprises 19 diverse scenes, offering comprehensive resources for free-viewpoint image-based rendering (IBR). Each scene includes input images, COLMAP reconstructions (SfM & MVS), global textured meshes from RealityCapture, refined depth maps, per-view meshes, and camera poses. This dataset is ideal for training and evaluating novel view synthesis and blending algorithms in both indoor and outdoor environments.

2 papers4 benchmarks

Duke Lung Nodule Dataset 2024

Background: Lung cancer risk classification is an increasingly important area of research as low-dose thoracic CT screening programs have become standard of care for patients at high risk for lung cancer. There is limited availability of large, annotated public databases for the training and testing of algorithms for lung nodule classification.

2 papers1 benchmarks3D, Biomedical, Images, Medical

ClimateIQA

The dataset was created to address the crucial need for effective Extreme Weather Events Detection (EWED), an increasingly urgent task due to the rising frequency of such events driven by global warming. Traditional methods for EWED rely on numerical threshold setting and the analysis of weather anomaly heatmaps, visualizing data such as temperature, wind speed, and precipitation. However, these methods often involve manual work and can be time-consuming and error-prone. While advances in AI have led to the development of machine learning models like Convolutional Neural Networks (CNNs) for weather prediction and EWED, these models predominantly use numeric data and often yield low accuracy. Moreover, despite the proficiency of Large Language Models (LLMs) in generating textual weather reports, they struggle with interpreting visual data—crucial for EWED. General Vision-Language Models (VLMs) also face challenges in accurately interpreting meteorological heatmaps, commonly misidentifyi

2 papers0 benchmarksEnvironment, Images, Texts

ImageNet3D

ImageNet3D for general-purpose object-level 3D understanding. Tasks include:

2 papers0 benchmarks

LUMA (Learning from Uncertain and Multimodal Data)

LUMA is a multimodal dataset that consists of audio, image, and text modalities. It allows controlled injection of uncertainties into the data and is mainly intended for studying uncertainty quantification in multimodal classification settings. This repository provides the Audio and Text modalities. The image modality consists of images from CIFAR-10/100 datasets. To download the image modality and compile the dataset with a specified amount of uncertainties, please use the LUMA compilation tool.

2 papers0 benchmarksAudio, Images, Texts

EARS-Reverb

The EARS-Reverb dataset uses real recorded room impulse responses (RIRs) from multiple public datasets (ACE-Challenge, AIR, ARNI, BRUDEX, dEchorate, DetmoldSRIR, and Palimpsest). All RIRs are fullband, and a randomly selected channel for multi-channel recordings is used. The reverberant speech is generated by convolving the clean speech with the RIR. To avoid a time delay between the reverberant and clean speech signal caused by the direct path of the RIR, the beginning of the RIR is cut off up to the index with the highest amplitude. Only RIRs with an RT60 reverberation time that does not exceed 2 s are used. Finally, the loudness of the reverberant speech is normalized to the loudness of the clean speech using the loudness K-weighted relative to full scale (LKFS).

2 papers5 benchmarksSpeech

RaDelft

The RaDelft dataset is a novel, large-scale, real-life, and multi-sensor dataset that has been recorded using a demonstrator vehicle in different locations in the city of Delft. It contains data from a lidar, an imaging radar board, a camera, and the ego vehicle’s odometry. Check the reference paper for more details about the data collection and the sensors' setup.

2 papers0 benchmarks

ARC-AGI (Abstraction and Reasoning Corpus for Artificial General Intelligence)

The ARC-AGI benchmark is a significant measure in the field of artificial intelligence, focusing on an AI's general reasoning capabilities. Recently, there has been a notable achievement where GPT-4o reached a 50% score on the ARC-AGI benchmark, surpassing the previous best score of 34%. This benchmark involves several examples and problems that require the system to infer rules and output correct results corresponding to the problem diagram.

2 papers0 benchmarksImages

edeniss2020 (EDEN ISS 2020 Telemetry Dataset)

Overview The edeniss2020 dataset is a time series dataset. It consists of equidistant sensor readings stemming from 97 sensors in the EDEN ISS research greenhouse.

2 papers0 benchmarksTime series
PreviousPage 349 of 1000Next