TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

TRR360D

TRR360D is based on the ICDAR2019MTD modern table detection dataset, it refers to the annotation format of the DOTA dataset. The training set contains 600 rotated images and 977 annotated instances, and the test set contains 240 rotated images and 499 annotated instances.

1 papers2 benchmarksImages

VTQA (Visual Text Question Answering)

VTQA is a dataset containing open-ended questions about image-text pairs. This dataset requires the model to align multimedia representations of the same entity to implement multi-hop reasoning between image and text and finally use natural language to answer the question. The aim of this dataset is to develop and benchmark models that are capable of multimedia entity alignment, multi-step reasoning and open-ended answer generation. VTQA dataset consists of 10,238 image-text pairs and 27,317 questions. The images are real images from MSCOCO dataset, containing a variety of entities. The annotators are required to first annotate relevant text according to the image, and then ask questions based on the image-text pair, and finally answer the question open-ended.

1 papers0 benchmarksImages, Texts

UR5 Tool Dataset

In this dataset UR5 robot used 6 tools: metal-scissor, metal-whisk, plastic-knife, plastic-spoon, wooden-chopstick, and wooden-fork to perform 6 behaviors: look, stirring-slow, stirring-fast, stirring-twist, whisk, and poke. The robot explored 15 objects: cane-sugar, chia-seed, chickpea, detergent, empty, glass-bead, kidney-bean, metal-nut-bolt, plastic-bead, salt, split-green-pea, styrofoam-bead, water, wheat, and wooden-button kept cylindrical containers. The robot performed 10 trials on each object using a tool, resulting in 5,400 interactions (6 tools x 6 behaviors x 15 objects x 10 trials). The robot records multiple sensory data (audio, RGB images, depth images, haptic, and touch images) while interacting with the objects.

1 papers0 benchmarksActions, Audio, Images, Interactive, RGB Video, RGB-D, Time series, Videos

MiSCS (Microscopic Shrub Cross Sections)

Microscopy images of shrub cross sections for instance segmentation of tree rings.

1 papers0 benchmarksImages

SynthBRSet (Synthetic Bike Rotation Dataset)

3D Computer Graphics is leveraged to generate a large and diverse dataset for training bike rotation estimators in bike parking assessment. By using 3D graphics software (Blender), the algorithm is able to accurately annotate the rotations of bikes with respect to the parking spot area in two axes y and z , which is crucial for training models for visual object-to-spot rotation estimation. Additionally, the ease of building the algorithm in Python made the generated dataset diverse with a wide range of variations in terms of parking space, lighting conditions, backgrounds, material textures, and colors, as well as objects and camera angles, to improve the generalization of the trained model. Overall, the use of 3D computer graphics allows for the efficient and precise generation of visual data for this task as well as for many potential tasks in computer vision.

1 papers0 benchmarksImages

international faces

"The Chicago Face Database was developed at the University of Chicago by Debbie S. Ma, Joshua Correll, and Bernd Wittenbrink. The CFD is intended for use in scientific research. It provides high-resolution, standardized photographs of male and female faces of varying ethnicity between the ages of 17-65. Extensive norming data are available for each individual model. These data include both physical attributes (e.g., face size) as well as subjective ratings by independent judges (e.g., attractiveness).

1 papers0 benchmarksImages, Tabular

CTCyclistDetectionDataset (Charles Tang)

Over 20,000 annotated synthetic images and web-scraped images of bicyclists with bounding box annotations in Pascal VOC format.

1 papers1 benchmarksImages

BFN (Backdoored Face-Networks Dataset)

This database is a database of backdoored neural networks intended for face recognition. The networks are of the FaceNet architecture and are trained on Casia-WebFace, with and without additional samples (which are the source of the backdoor). More information regarding backdoors and the project within which this fits can be found in the public release of the source code : https://gitlab.idiap.ch/bob/bob.paper.backdoored_facenets.biosig2022.

1 papers0 benchmarksImages

Burned Area Delineation from Satellite Imagery (A Dataset for Burned Area Delineation and Severity Estimation from Satellite Imagery)

The dataset contains 73 satellite images of different forests damaged by wildfires across Europe with a resolution of up to 10m per pixel. Data were collected from the Sentinel-2 L2A satellite mission and the target labels were generated from the Copernicus Emergency Management Service (EMS) annotations, with five different severity levels, ranging from undamaged to completely destroyed.

1 papers1 benchmarksImages

VR-Folding

VR-Folding contains garment meshes of 4 categories from CLOTH3D dataset, namely Shirt, Pants, Top and Skirt. For flattening task, there are 5871 videos which contain 585K frames in total. For folding task, there are 3896 videos which contain 204K frames in total. The data for each frame include multi-view RGB-D images, object masks, full garment meshes, and hand poses.

1 papers0 benchmarksImages, RGB-D, Videos

ARKitTrack

ARKitTrack is a new RGB-D tracking dataset for both static and dynamic scenes captured by consumer-grade LiDAR scanners equipped on Apple's iPhone and iPad. ARKitTrack contains 300 RGBD sequences, 455 targets, and 229.7K video frames in total. This dataset has 123.9K pixel-level target masks along with the bounding box annotations and frame-level attributes.

1 papers0 benchmarksImages, RGB-D, Videos

Fraunhofer Portugal AICOS EDoF Dataset

The Fraunhofer Portugal AICOS EDoF Dataset was produced within the TAMI project and is composed of images of microscopic fields of view (FOV) of Liquid-based Cervical Cytology (LBC) samples. A total of 15 LBC samples were supplied by the Pathology Services from Hospital Fernando Fonseca and the Portuguese Oncology Institute of Porto. For each LBC sample, a set of images were obtained using a version of µSmartScope [1,2] prototype adapted to the cervical cytology use case [3,4].

1 papers0 benchmarksBiomedical, Images, Medical

MELON (Melodic Design)

A unique dataset comprising multimodal creative and designed documents containing images with corresponding captions paired with music based on around 50mood/themes.

1 papers0 benchmarksImages, Texts

MuCeD

MuCeD, a dataset that is carefully curated and validated by expert pathologists from the All India Institute of Medical Science (AIIMS), Delhi, India. The H&E-stained histopathology images of the human duodenum in MuCeD are captured through an Olympus BX50 microscope at 20x zoom using a DP26 camera with each image being 1920x2148 in dimension. The dataset has 55 images, with bounding boxes for 2,090 IELs and 6,518 ENs annotated using the LabelMe software and are further validated by multiple pathologists. These cells are selected from the epithelial area -- a region of interest that has been explicitly segmented by experts. The epithelial area denotes the area of continuous villi and is used for cell detection, whereas rest of the area is masked out. Further, each image is sliced into 9 subimages and each subimage is re-scaled to 640x640, before it is given as input to object detection models. We divide 55 images into five folds of 11 images each and report 5-fold crossvalidation num

1 papers0 benchmarksBiomedical, Images, Medical

A Large Scale Fish Dataset (A Large-Scale Dataset for Fish Segmentation and Classification)

This dataset contains 9 different seafood types collected from a supermarket in Izmir, Turkey for a university-industry collaboration project at Izmir University of Economics, and this work was published in ASYU 2020. The dataset includes gilt head bream, red sea bream, sea bass, red mullet, horse mackerel, black sea sprat, striped red mullet, trout, shrimp image samples.

1 papers0 benchmarksImages, Texts

IAW Dataset (Ikea Assembly In The Wild Dataset)

The IAW dataset contains 420 Ikea furniture pieces from 14 common categories e.g. sofa, bed, wardrobe, table, etc. Each piece of furniture comes with one or more user instruction manuals, which are first divided into pages and then further divided into independent steps cropped from each page (some pages contain more than one step and some pages do not contain instructions). There are 8568 pages and 8263 steps overall, on average 20.4 pages and 19.7 steps for each piece of furniture. We crawled YouTube to find videos corresponding to these instruction manuals and as such the conditions in the videos are diverse on many aspects e.g. duration, resolution, first- or third-person view, camera pose, background environment, number of assemblers, etc. The IAW dataset contains 1005 raw videos with a length of around 183 hours in total. Among them, approximately 114 hours of content are labeled as 15649 actions to match the corresponding step in the corresponding manual.

1 papers0 benchmarksImages, Videos

L1BSR (L1BSR dataset)

The Sentinel-2 satellite carries 12 CMOS detectors for the VNIR bands, with adjacent detectors having overlapping fields of view that result in overlapping regions in level-1 B (L1B) images. This dataset includes 3740 pairs of overlapping image crops extracted from two L1B products. Each crop has a height of around 400 pixels and a variable width that depends on the overlap width between detectors for RGBN bands, typically around 120-200 pixels. In addition to detector parallax, there is also cross-band parallax for each detector, resulting in shifts between bands. Pre-registration is performed for both cross-band and cross-detector parallax, with a precision of up to a few pixels (typically less than 10 pixels).

1 papers0 benchmarksImages, Stereo

YIM Dataset (Yeast Cells in Microstructures Dataset)

An instance segmentation dataset of yeast cells in microstructures. The dataset includes 493 densely annotated microscopy images. For more information see the paper "An Instance Segmentation Dataset of Yeast Cells in Microstructures".

1 papers0 benchmarksBiology, Images, Medical

MC1296 (Meter_Challenge)

a dataset of reading pointer meter

1 papers0 benchmarksImages

Multimedia Goal-oriented Generative Script Learning Dataset

Multimedia Goal-oriented Generative Script Learning Dataset This link contains a dataset consisting of multimedia steps for two categories: gardening and crafts. The dataset consists of a total of 79,089 multimedia steps across 5,652 tasks.

1 papers0 benchmarksImages, Texts
PreviousPage 129 of 164Next