TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

Hazards&Robots (Hazards&Robots: A Dataset for Visual Anomaly Detection in Robotics)

We consider the problem of detecting, in the visual sensing data stream of an autonomous mobile robot, semantic patterns that are unusual (i.e., anomalous) with respect to the robot’s previous experience in similar environments. These anomalies might indicate unforeseen hazards and, in scenarios where failure is costly, can be used to trigger an avoidance behavior. We contribute three novel image-based datasets acquired in robot exploration scenarios, comprising a total of more than 200k labeled frames, spanning various types of anomalies.

3 papers0 benchmarksImages

SDD

SDD dataset contains a variety of indoor and outdoor scenes, designed for Image Defocus Deblurring. There are 50 indoor scenes and 65 outdoor scenes in the training set, and 11 indoor scenes and 24 outdoor scenes in the testing set.

3 papers2 benchmarksImages

LAION COCO

LAION-COCO is the world’s largest dataset of 600M generated high-quality captions for publicly available web-images. The images are extracted from the english subset of Laion-5B with an ensemble of BLIP L/14 and 2 CLIP versions (L/14 and RN50x64). This dataset allow models to produce high quality captions for images.

3 papers4 benchmarksImages

SDN (Situated Dialogue Navigation)

Situated Dialogue Navigation (SDN) is a navigation benchmark of 183 trials with a total of 8415 utterances, around 18.7 hours of control streams, and 2.9 hours of trimmed audio. SDN is developed to evaluate the agent's ability to predict dialogue moves from humans as well as generate its own dialogue moves and physical navigation actions.

3 papers0 benchmarksActions, Dialog, Environment, Images, Speech, Texts, Videos

DeepSportRadar-v1

DeepSportradar is a benchmark suite of computer vision tasks, datasets and benchmarks for automated sport understanding. DeepSportradar currently supports four challenging tasks related to basketball: ball 3D localization, camera calibration, player instance segmentation and player re-identification. For each of the four tasks, a detailed description of the dataset, objective, performance metrics, and the proposed baseline method are provided.

3 papers0 benchmarksImages

BAFMD (Bias-Aware Face Mask Detection Dataset)

BAFMD contains images posted on Twitter during the pandemic from around the world with more images from underrepresented race and age groups to mitigate the problem for the face mask detection task.

3 papers0 benchmarksImages

DOORS (Dataset fOr bOuldeRs Segmentation)

DOORS is a dataset designed for boulders recognition, centroid regression, segmentation, and navigation applications. The dataset is divided into two sets:

3 papers0 benchmarks3D, Images

Tobacco800

Tobacco800 is a public subset of the complex document image processing (CDIP) test collection constructed by Illinois Institute of Technology, assembled from 42 million pages of documents (in 7 million multi-page TIFF images) released by tobacco companies under the Master Settlement Agreement and originally hosted at UCSF.

3 papers0 benchmarksImages

UFPR-Periocular

The UFPR-Periocular dataset has 16,830 images of both eyes (33,660 cropped images of each eye) from 1,122 subjects (2,244 classes).

3 papers0 benchmarksImages

SeaTurtleID

SeaTurtleID is a public large-scale, long-span dataset with sea turtle photographs captured in the wild. The dataset is suitable for benchmarking re-identification methods and evaluating several other computer vision tasks. It consists of 7774 high-resolution photographs of 400 unique individuals collected within 12 years in 1081 encounters. Each photograph is accompanied by rich metadata, e.g., identity label, head segmentation mask, and encounter timestamp.

3 papers0 benchmarksImages

ImageNet_CN (Chinese ImageNet Classification)

transform the ImageNet-1K classification datatset for Chinese models by translating labels and prompts into Chinese.

3 papers1 benchmarksImages

CORSMAL

CORSMAL is a dataset for estimating the position and orientation in 3D (or 6D pose) of an object from a single view. The dataset consists of 138,240 images of rendered hands and forearms holding 48 synthetic objects, split into 3 grasp categories over 30 real backgrounds.

3 papers0 benchmarksImages

E-NER

E-NER is a publicly available legal Named Entity Recognition (NER) data set. It contains 52 filings from the US SEC EDGAR database. The named entity tags are hand annotated.

3 papers0 benchmarksImages

RGB Arabic Alphabets Sign Language Dataset

This paper introduces the RGB Arabic Alphabet Sign Language (AASL) dataset. AASL comprises 7,856 raw and fully labeled RGB images of the Arabic sign language alphabets, which to our best knowledge is the first publicly available RGB dataset. The dataset is aimed to help those interested in developing real-life Arabic sign language classification models. AASL was collected from more than 200 participants and with different settings such as lighting, background, image orientation, image size, and image resolution. Experts in the field supervised, validated and filtered the collected images to ensure a high-quality dataset. AASL is made available to the public on Kaggle.

3 papers0 benchmarksImages

ACL-Fig

ACL-Fig is a large-scale automatically annotated corpus consisting of 112,052 scientific figures extracted from 56K research papers in the ACL Anthology. The ACL-Fig-pilot dataset contains 1,671 manually labeled scientific figures belonging to 19 categories.

3 papers0 benchmarksImages, Texts

SkinCon (SKIN Concepts Dataset)

SkinCon is a skin disease dataset densely annotated by dermatologists. SkinCon includes 3230 images from the Fitzpatrick 17k skin disease (Fitzpatrick Skin Tone) dataset densely labelled with 48 clinical concepts, 22 of which have at least 50 images representing the concept. The concepts used were chosen by two dermatologists considering the clinical descriptor terms used to describe skin lesions. Examples include "plaque", "scale", and "erosion".

3 papers0 benchmarksImages, Medical

S-VED (Sacrobosco Visual Element Dataset)

The Sacrobosco Visual Elements Dataset (S-VED) is derived from 359 Sphaera editions, centered on the Tractatus de sphaera by Johannes de Sacrobosco (—1256) and printed between 1472 and 1650. The Sphaera editions were primarily used to teach geocentric astronomy to university students across Europe. Their visual elements, therefore, played an essential role in visualizing the ideas, messages, and concepts that the texts transmitted. As a precondition for studying the relation between text and visual elements, a time-consuming image labelling process was conducted as part of “The Sphere” project (https://sphaera.mpiwg-berlin.mpg.de) in order to extract and label the visual elements from the 76,000 pages of the corpus. This process resulted in the creation of the Extended Sacrobosco Visual Elements Dataset (S-VED𝑋) of which S-VED is a subset of. Due to copyright reasons only S-VED is made publicly available. S-VED consists of 4000 pages of which 2040 contain a total of 2927 visual element

3 papers0 benchmarksImages

GAS (Grasp Area Segmentation)

GAS (Grasp Area Segmentation) dataset consists of 10089 RGB images of cluttered scenes grouped into 1121 grasp-area segmentation tasks. For each RGB image we provide a binary segmentation map with the graspable and non-graspable regions for every object in the scene. The dataset can be used for meta-training part-based grasp area estimation networks.

3 papers0 benchmarksImages

EPIC-Hotspot

From Grounded Human-Object Interaction Hotspots from Video (ICCV'19): We collect annotations for interaction keypoints on EPIC Kitchens in order to quantitatively evaluate our method in parallel to the OPRA dataset (where annotations are available). We note that these annotations are collected purely for evaluation, and are not used for training our model. We select the 20 most frequent verbs, and select 31 nouns that afford these interactions.

3 papers3 benchmarksImages, Videos

U2OS

The archive contains original images from U2OS cells stained with Hoechst 33342 as PNG files. It also contains images (as Photoshop and GIMP files) showing hand-segmentation of the Hoechst images into regions containing single nuclei.

3 papers0 benchmarksImages
PreviousPage 87 of 164Next