TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

WikiTableSet (Wikipedia Table Image Dataset)

WikiTableSet is a large publicly available image-based table recognition dataset in three languages built from Wikipedia. WikiTableSet contains nearly 4 million English table images, 590K Japanese table images, 640k French table images with corresponding HTML representation, and cell bounding boxes. We build a Wikipedia table extractor WTabHTML and use this to extract tables (in HTML code format) from the 2022-03-01 dump of Wikipedia. In this study, we select Wikipedia tables from three representative languages, i.e., English, Japanese, and French; however, the dataset could be extended to around 300 languages with 17M tables using our table extractor. Second, we normalize the HTML tables following the PubTabNet format (separating table headers and table data, removing CSS and style tags). Finally, we use Chrome and Selenium to render table images from table HTML codes. This dataset provides a standard benchmark for studying table recognition algorithms in different languages or even

2 papers0 benchmarksImages, Tabular

NIH3T3

The archive contains original images from NIH3T3 cells stained with Hoechst 33342 as PNG files. It also contains images (as Photoshop and GIMP files) showing hand-segmentation of the Hoechst images into regions containing single nuclei.

2 papers0 benchmarksImages

Deep PCB (Deep Printed Circuit Board)

DeepPCB

2 papers1 benchmarksImages

Large COVID-19 CT scan slice dataset

"We built a large lung CT scan dataset for COVID-19 by curating data from 7 public datasets listed in the acknowledgements. These datasets have been publicly used in COVID-19 diagnosis literature and proven their efficiency in deep learning applications. Therefore, the merged dataset is expected to improve the generalization ability of deep learning methods by learning from all these resources together.

2 papers21 benchmarksImages

PKU (License Plate Detection)

The PKU dataset has almost 4,000 images categorized into five groups (G1-G5) that show different situations. For example, G1 has images of highways during the day with only one car in them. On the other hand, G5 has images of crosswalks during the day or at night with multiple cars and license plates (LPs).

2 papers0 benchmarksImages

New Plant Diseases Dataset (Image dataset containing different healthy and unhealthy crop leaves.)

This dataset is recreated using offline augmentation from the original dataset. The original dataset can be found on this github repo. This dataset consists of about 87K rgb images of healthy and diseased crop leaves which is categorized into 38 different classes. The total dataset is divided into 80/20 ratio of training and validation set preserving the directory structure. A new directory containing 33 test images is created later for prediction purpose.

2 papers1 benchmarksImages

SIMARA (SIMARA: a database for key-value information extraction from full-page handwritten documents)

Description We propose a new database for information extraction from historical handwritten documents. The corpus includes 5,393 finding aids from six different series, dating from the 18th-20th centuries. Finding aids are handwritten documents that contain metadata describing older archives. They are stored in the National Archives of France and are used by archivists to identify and find archival documents.

2 papers5 benchmarksImages, Texts

CWD30 (Crop Weed Dataset 30 species)

CWD30 comprises over 219,770 high-resolution images of 20 weed species and 10 crop species, encompassing various growth stages, multiple viewing angles, and environmental conditions. The images were collected from diverse agricultural fields across different geographic locations and seasons, ensuring a representative dataset.

2 papers0 benchmarksImages

ENRICH (Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry)

A new synthetic, multi-purpose dataset - called ENRICH - for testing photogrammetric and computer vision algorithms. Compared to existing datasets, ENRICH offers higher resolution images also rendered with different lighting conditions, camera orientation, scales, and field of view. Specifically, ENRICH is composed of three sub-datasets: ENRICH-Aerial, ENRICH-Square, and ENRICH-Statue, each exhibiting different characteristics. The proposed dataset is useful for several photogrammetry and computer vision-related tasks, such as the evaluation of hand-crafted and deep learning-based local features, effects of ground control points (GCPs) configuration on the 3D accuracy, and monocular depth estimation.

2 papers0 benchmarksImages

RUGD (RUGD: Robot Unstructured Ground Driving)

A Video Dataset for Visual Perception and Autonomous Navigation in Unstructured Environments. Website: http://rugd.vision/

2 papers4 benchmarksImages

Drone-Action (Drone-Action: An Outdoor Recorded Drone Video Dataset for Action Recognition)

Website: https://asankagp.github.io/droneaction/

2 papers4 benchmarksImages, Videos

CVB (Video Dataset of Cattle Visual Behaviors)

Existing image/video datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. This limits the utility of machine learning (ML) models learned from them. Therefore, we introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle. By creating and sharing CVB, our aim is to develop improved models capable of recognizing all important cattle behaviors accurately and to assist other researchers and practitioners in developing and evaluating new ML models for cattle behavior classification using video data. The dataset is presented in the form of following three sub-directories. 1. raw_frames: contains 450 frames in each sub folder representing a 15 second video taken at a frame rate of 30 FPS. 2. annotations: contains the json file

2 papers0 benchmarksActions, Images, Tracking, Videos

IRFL: Image Recognition of Figurative Language

The IRFL dataset consists of idioms, similes, and metaphors with matching figurative and literal images, as well as two novel tasks of multimodal figurative understanding and preference.

2 papers2 benchmarksImages, Texts

FaMoS (Facial Motion across Subjects)

FaMoS is a dynamic 3D head dataset from 95 subjects, each performing 28 motion sequences. The sequences comprise of six prototypical expressions (i.e., Anger, Disgust, Fear, Happiness, Sadness, and Surprise), two head rotations (left/right and up/down), and diverse facial motions, including extreme and asymmetric expressions. Each sequence is recorded at 60 fps. In total, FaMoS contains around 600K 3D head meshes (i.e., ~225 frames per sequence). For each frame, registrations in FLAME meshes are publicly available.

2 papers0 benchmarks3D, Images

S2RDA

Our proposed Synthetic-to-Real benchmark for more practical visual DA (termed S2RDA) includes two challenging transfer tasks of S2RDA-49 and S2RDA-MS-39. In each task, source/synthetic domain samples are synthesized by rendering 3D models from ShapeNet. The used 3D models are in the same label space as the target/real domain and each class has 12K rendered RGB images. The real domain of S2RDA-49 comprises 60,535 images of 49 classes, collected from ImageNet validation set, ObjectNet, VisDA-2017 validation set, and the web. For S2RDA-MS-39, the real domain collects 41,735 natural images exclusive for 39 classes from MetaShift, which contain complex and distinct contexts, e.g., object presence (co-occurrence of different objects), general contexts (indoor or outdoor), and object attributes (color or shape), leading to a much harder task. Compared to VisDA-2017, our S2RDA contains more categories, more realistically synthesized source domain data coming for free, and more complicated targ

2 papers0 benchmarksImages

EBD

A large-scale benchmark with 1605 high-resolution, well-annotated images, featuring more complex scenes and a wider range of DOF settings.

2 papers1 benchmarksImages

MiniWob++

MiniWob++ is a suite of web-browser based tasks introduced in Liu et al. (2018) (an extension of the earlier MiniWob task suite (Shi et al., 2017)). Tasks range from simple button clicking to complex form-filling, for example, to book a flight when given particular instructions (Fig. 1a). Programmatic rewards are available for each task, permitting standard reinforcement learning techniques.

2 papers0 benchmarksActions, Images, Interactive, Texts

Parcel3D

Synthetic dataset of over 13,000 images of damaged and intact parcels with full 2D and 3D annotations in the COCO format. For details see our paper and for visual samples our project page.

2 papers0 benchmarks3D, Images

MeDAL Retina Dataset (MeDAL Retina Dataset)

Our primary objective in creating this dataset is to support researchers in the advancement of algorithms for keypoints detection and the pretraining of large models on retinal images using a self-supervised approach. The keypoints in the dataset have been carefully annotated by students from our lab, ensuring meticulous accuracy.

2 papers0 benchmarksImages, Medical

SK-VG

SK-VG is a dataset for Scene Knowledge-guided Visual Grounding, where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge. To perform this task, SK-VG is the first dataset of the fourth type, where for each image, we provide human-written knowledge to describe its content.

2 papers0 benchmarksImages
PreviousPage 102 of 164Next