TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

WildQA

WildQA is a video understanding dataset of videos recorded in outside settings. The dataset can be used to evaluate models for video question answering.

1 papers3 benchmarksImages, Texts

Baxter-UR5_95-Objects

In this dataset two robots, Baxter and UR5, perform 8 behaviors (look, grasp, pick, hold, shake, lower, drop, and push) on 95 objects that vary by 5 color (blue, green, red, white, and yellow), 6 contents (wooden button, plastic dices, glass marbles, nuts & bolts, pasta, and rice), and 4 weights (empty, 50g, 100g, and 150g). There are 90 objects with contents (5 colors x 3 weights x 6 contents) and 5 objects without any content that only vary by 5 colors. Both robots perform 5 trials on each object, resulting in 7,600 interactions (2 robots x 8 behaviors x 95 objects x 5 trials

1 papers0 benchmarksActions, Audio, Images, Interactive, RGB Video, RGB-D, Time series, Videos

Mars DTM Estimation

This dataset is useful for doing research in the field of mars surface monocular depth estimation. The dataset is composed of 250k patches where each patch is a 3-channels 512 x 512 raster. The first two channels are respectively left and right images of the stereo pair while the third channel is the DTM. Because DTMs are saved with absolute values you have to preprocess in case you want to predict relative values. The Dataset size is 800 GB.

1 papers12 benchmarksImages

Thermal Face Database

High-resolution thermal infrared face database with extensive manual annotations, introduced by Kopaczka et al, 2018. Useful for training algoeithms for image processing tasks as well as facial expression recognition. The full database itself, all annotations and the complete source code are freely available from the authors for research purposes at https://github.com/marcinkopaczka/thermalfaceproject.

1 papers0 benchmarksImages

RSP Dataset

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksImages

Leishmania parasite dataset

This dataset includes sharp-blur pairs of Leishmania image, which is a protozoan parasite microscopy image dataset of Leishmania, obtained from the preserved slides stained with Giemsa. The paired blur-sharp images are acquired by employing a bright-field microscope (Olympus IX53) with 100× magnification oil immersion objectives.We first capture the sharp images as ground truth, then acquire its corresponding out-of-focus images. The extent and nature of defocusing are random along the optical axis, where the degree of out-of-focus is inconsistent from image-to-image. This dataset includes 764 in-focus and 764 corresponding out-of-focus images, where each image is composed of 2304 × 1728 pixels in 24-bit JPG format.

1 papers0 benchmarksImages, Medical

V-MIND

V-MIND enhanced the MIND dataset with news pictures.

1 papers0 benchmarksImages

AesVQA

AesVQA is a dataset that contains 72168 high-quality images and 324756 pairs of aesthetic questions. This dataset addresses the task of aesthetic VQA and introduces subjectiveness into VQA tasks.

1 papers0 benchmarksImages, Texts

ChiQA (Chinese VQA)

ChiQA is a dataset designed for visual question answering tasks that not only measures the relatedness but also measures the answerability, which demands more fine-grained vision and language reasoning. It contains more than 40K questions and more than 200K question-images pairs. The questions are real-world image-independent queries that are more various and unbiased.

1 papers0 benchmarksImages, Texts

HM3D-Semantics (Habitat-Matterport 3D Semantics)

Habitat-Matterport 3D Semantics Dataset (HM3D-Semantics v0.1) is the largest-ever dataset of semantically-annotated 3D indoor spaces. It contains dense semantic annotations for 120 high-resolution 3D scenes from the Habitat-Matterport 3D dataset. The HM3D scenes are annotated with the 1700+ raw object names, which are mapped to 40 Matterport categories. On average, each scene in HM3D-Semantics v0.1 consists of 646 objects from 114 categories.

1 papers0 benchmarks3D, Images

VizWiz-FewShot

VizWiz-FewShot is a a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments.

1 papers0 benchmarksImages

VISOR - Semi supervised video object segmentation (val)

VISOR is a dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, and it contains 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, and 67K hand-object relations, covering 36 hours of 179 untrimmed videos.

1 papers0 benchmarksImages, Videos

Trailers12k

Trailers12k is a movie trailer dataset comprised of 12,000 titles associated to ten genres. It distinguishes from other datasets by its collection procedure aimed at providing a high-quality publicly available dataset.

1 papers0 benchmarksImages, Texts, Videos

Multi-domain Image Characteristics Dataset

The Multi-domain Image Characteristic Dataset consists of thousands of images sourced from the internet. Each image falls under one of three domains - animals, birds, or furniture. There are five types under each domain. There are 200 images of each type, summing up the total dataset to 3,000 images. The master file consists of two columns; the image name and the visible characteristics of that image. Every image was manually analyzed and the characteristics for each image were generated, ensuring accuracy.

1 papers0 benchmarksImages, Texts

HOWS (HOWS-CL-25)

HOWS-CL-25 (Household Objects Within Simulation dataset for Continual Learning) is a synthetic dataset especially designed for object classification on mobile robots operating in a changing environment (like a household), where it is important to learn new, never seen objects on the fly. This dataset can also be used for other learning use-cases, like instance segmentation or depth estimation. Or where household objects or continual learning are of interest.

1 papers1 benchmarksImages, RGB-D

HYPERVIEW (Seeing Beyond the Visible)

The dataset comprises 2886 patches in total (2 m GSD), of which 1732 patches for training and 1154 patches for testing. The patch size varies (depending on agricultural parcels) and is on average around 60x60 pixels. Each patch contains 150 contiguous hyperspectral bands (462-942 nm, with a spectral resolution of 3.2 nm), which reflects the spectral range of the hyperspectral imaging sensor deployed on-board Intuition-1.

1 papers1 benchmarks3D, Images

HuTu 80 (HuTu 80 cell populations)

The image set contains 180 high-resolution color microscopic images of human duodenum adenocarcinoma HuTu 80 cell populations obtained in an in vitro scratch assay (for the details of the experimental protocol, we refer to (Liang et al., 2007)). Briefly, cells were seeded in 12-well culture plates ($20 \times 10^3$ cells per well) and grown to form a monolayer with 85\% or more confluency. Then the cell monolayer was scraped in a straight line using a pipette tip ($200 \mu L$). The debris was removed by washing with a growth medium and the medium in wells was replaced. The scratch areas were marked to obtain the same field during the image acquisition. Images of the scratches were captured immediately following the scratch formation, as well as after 24, 48 and 72 h of cultivation.

1 papers2 benchmarksBiomedical, Images

MovieCLIP

MovieCLIP is a movie-centric taxonomy of 179 scene labels derived from movie scripts and auxiliary web-based video datasets designed for visual scene recognition.

1 papers0 benchmarksImages

RGZ EMU: Semantic Taxonomy (Radio Galaxy Zoo EMU: Towards a Semantic Radio Galaxy Morphology Taxonomy)

The data used in - "Radio Galaxy Zoo EMU: Towards a Semantic Radio Galaxy Morphology Taxonomy" (Bowles et al. submitted) - "A New Task: Deriving Semantic Class Targets for the Physical Sciences" (Bowles et al. 2022: https://arxiv.org/abs/2210.14760) accepted at the Fifth Workshop on Machine Learning and the Physical Sciences, Neural Information Processing Systems 2022.

1 papers0 benchmarksImages, Tabular, Texts

Panoramic Video Panoptic Segmentation Dataset

Panoramic Video Panoptic Segmentation Dataset is a large-scale dataset that offers high-quality panoptic segmentation labels for autonomous driving. The dataset has labels for 28 semantic categories and 2,860 temporal sequences that were captured by five cameras mounted on autonomous vehicles driving in three different geographical locations, leading to a total of 100k labeled camera images.

1 papers0 benchmarksImages
PreviousPage 125 of 164Next