TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

Canon RAW Low Light (Canon Camera Low Light RAW Image Dataset)

The goal of this project is to present two new datasets that seek to expand the capability of the Learning to See in the Dark Low-light enhancement CNN for the Canon 6D DSLR, and explore how the network performs when modified in various ways, both pruning it and making it deeper.

1 papers4 benchmarksImages

UIUC Scooping Dataset (Granular Materials Manipulation Dataset with Scooping/Digging/Excavation Action)

Overview: This dataset encompasses a compilation of 6,700 executed scoops (excavations), mapped across a vast spectrum of materials, terrain topography, and compositions.

1 papers0 benchmarksEnvironment, Images, Point cloud, RGB-D, Time series

Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications

The dataset is generated from the study of computational reproducibility of Jupyter notebooks from biomedical publications. Our focus lies in evaluating the extent of reproducibility of Jupyter notebooks derived from GitHub repositories linked to publications present in the biomedical literature repository, PubMed Central. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.

1 papers0 benchmarksImages, Tables, Tabular

Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications version 1 (Version 1)

This repository contains the dataset for the study of the computational reproducibility of Jupyter notebooks from biomedical publications. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.

1 papers0 benchmarksImages, Tables, Tabular

ImageNet-Atr (ImageNet with Adversarial Text Regions)

We build a new evaluation set by adding spotting words to the images of ImageNet 2012 evaluation sets. There are 1,000 categories in ImageNet. For each category c, we find its most confusing category c*and spot the category name to every evaluation image.

1 papers0 benchmarksImages

ISEKAI

ISEKAI dataset’s images are generated by Midjourney’s text-to-image model using well-crafted instructions. Images were manually selected to ensure core concept consistency. The dataset currently comprises 20 groups, and 40 categories in total (continues to grow). Each group pairs a new concept with a related real-world concept, like "octopus vacuum" and "octopus." These can serve as challenging negative samples for each other. Each concept has no less than 32 images, supporting multi-shot examples.

1 papers0 benchmarksImages

NIH 3T3 microtubule cell dataset

The data consists of 21 images of microtubules in PFA-fixed NIH 3T3 mouse embryonic fibroblasts (DSMZ: ACC59) labeled with a mouse anti-alpha-tubulin monoclonal IgG1 antibody (Thermofisher A11126, primary antibody) and visualized by a blue-fluorescent Alexa Fluor 405 goat anti-mouse IgG antibody (Thermofisher A-31553, secondary antibody). Acquisition of the images was performed using a confocal microscope (Olympus IX81).

1 papers0 benchmarksImages

USPTO-30K

We introduce USPTO-30K, a large-scale benchmark dataset of annotated molecule images, which overcomes these limitations. It is created using the pairs of images and MolFiles by the United States Patent and Trademark Office. Each molecule was independently selected among all the available documents from 2001 to 2020. The set consists of three subsets to decouple the study of clean molecules, molecules with abbreviations and large molecules.

1 papers0 benchmarksGraphs, Images

MolGrapher-Synthetic-300K

The set is created using molecule SMILES retrieved from the database PubChem. Images are then generated from SMILES using the molecule drawing library RDKit. The synthetic set is augmented at multiple levels:

1 papers0 benchmarksGraphs, Images

TYC Dataset (The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures)

We introduce the trapped yeast cell (TYC) dataset, a novel dataset for understanding instance-level semantics and motions of cells in microstructures. We release $105$ dense annotated high-resolution brightfield microscopy images, including about $19$k instance masks. We also release $261$ curated video clips composed of $1293$ high-resolution microscopy images to facilitate unsupervised understanding of cell motions and morphology.

1 papers0 benchmarksImages, Videos

SD7K (Shadow Document 7K)

SD7K is the only large-scale high-resolution dataset that satisfies all important data features about document shadow currently, which covers a large number of document shadow images. Mean resolution is $2462 \times 3699$

1 papers0 benchmarksImages

AI-ready multiplex IHC-IF dataset (AI-ready restained and co-registered multiplex dataset for head-and-neck squamous cell carcinoma)

We introduce a new AI-ready computational pathology dataset containing restained and co-registered digitized images from eight head-and-neck squamous cell carcinoma patients. Specifically, the same tumor sections were stained with the expensive multiplex immunofluorescence (mIF) assay first and then restained with cheaper multiplex immunohistochemistry (mIHC). This is a first public dataset that demonstrates the equivalence of these two staining methods which in turn allows several use cases; due to the equivalence, our cheaper mIHC staining protocol can offset the need for expensive mIF staining/scanning which requires highly skilled lab technicians. As opposed to subjective and error-prone immune cell annotations from individual pathologists (disagreement > 50%) to drive SOTA deep learning approaches, this dataset provides objective immune and tumor cell annotations via mIF/mIHC restaining for more reproducible and accurate characterization of tumor immune microenvironment (e.g. for

1 papers0 benchmarksBiology, Images, Medical

CLPD (China License Plate Dataset)

The CLPD dataset comprises 1200 images that encompass various regions within mainland China. These images were sourced from diverse origins, including the internet, mobile devices, and in-car recording devices. While the majority of the images were recorded during daylight hours, a portion of them were captured at nighttime. The dataset predominantly features passenger cars, with a limited number of images depicting trucks and buses.

1 papers0 benchmarksImages

CD-HARD

CD-HARD comprises 102 images featuring vehicles with oblique license plates sourced from the Cars dataset. Each image within this dataset exclusively depicts a single vehicle and was captured during daylight hours. While the dataset encompasses images from diverse geographic regions, it predominantly consists of images seemingly taken in European locales.

1 papers0 benchmarksImages

CY101 Dataset

In this dataset an uppertorso humanoid robot with 7-DOF arm explored 100 different objects belonging to 20 different categories using 10 behaviors: Look, Crush, Grasp, Hold, Lift, Drop, Poke, Push, Shake and Tap.

1 papers0 benchmarksActions, Audio, Images, Interactive, RGB Video, Texts, Time series, Videos

urban_change_monitoring_mariupol_ua (Monitoring of Urban Changes for Mariupol, Ukraine)

This dataset contains the ground truth for urban changes occurred in Mariupol, Ukraine for the time frame 2017-2020. This is useful for transferring the urban change monitoring network ERCNN-DRS (https://github.com/It4innovations/ERCNN-DRS_urban_change_monitoring) to that region.

1 papers0 benchmarksImages, Time series

CapMIT1003

The CapMIT1003 database contains captions and clicks collected for images from the MIT1003 database, for which reference eye scanpath are available. The database is distributed as a single SQLite3 database named capmit1003.db. For convenience, a lightweight Python class to access the database is provided in the official repository

1 papers1 benchmarksImages, Time series

Robot@Home2 (Robot@Home2, a robotic dataset of home environments)

Robot@Home2, is an enhanced version aimed at improving usability and functionality for developing and testing mobile robotics and computer vision algorithms. Robot@Home2 consists of three main components. Firstly, a relational database that states the contextual information and data links, compatible with Standard Query Language. Secondly,a Python package for managing the database, including downloading, querying, and interfacing functions. Finally, learning resources in the form of Jupyter notebooks, runnable locally or on the Google Colab platform, enabling users to explore the dataset without local installations. These freely available tools are expected to enhance the ease of exploiting the Robot@Home dataset and accelerate research in computer vision and robotics.

1 papers0 benchmarks3D, 3d meshes, Images, LiDAR, Point cloud, RGB Video, Videos

InfraParis

InfraParis is a novel and versatile dataset supporting multiple tasks across three modalities: RGB, depth, and infrared. From the city to the suburbs, it contains a variety of styles in different areas of the greater Paris area, providing rich semantic information. InfraParis contains 7301 images with bounding boxes and full semantic (19 classes) annotations. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.

1 papers0 benchmarksImages, RGB Video, RGB-D, Stereo

Multi-Sensor Calibration

Two separate datasets of calibration runs in front of a calibration board: - 4IMUs+3Cams -4IMUs+4Cams

1 papers0 benchmarksImages
PreviousPage 132 of 164Next