TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

Electronics Object Image Dataset | Computer Parts (datacluster.ai)

This dataset is an extremely challenging set of over 5000+ original Electronic Items images captured and crowdsourced from over 1000+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs.

0 papers0 benchmarksImages

Hindi Text Image Dataset | Hindi in the wild (datacluster.ai)

This dataset is an extremely challenging set of over 5000+ original Hindi text images captured and crowdsourced from over 700+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at DataclusterLabs.

0 papers0 benchmarksImages

Multi-Spectral Leaf Segmentation (Multi-Spectral Leaf Segmentation For Crop/Weed Identification)

This dataset were acquired with the Airphen (Hyphen, Avignon, France) six-band multi-spectral camera configured using the 450/570/675/710/730/850 nm bands with a 10 nm FWHM. And acquired on the site of INRAe in Montoldre (Allier, France, at 46°20'30.3"N 3°26'03.6"E) within the framework of the “RoSE challenge” founded by the French National Research Agency (ANR). Images contains bean, with various natural weeds (yarrows, amaranth, geranium, plantago, etc) and sowed ones (mustards, goosefoots, mayweed and ryegrass) with very distinct characteristics in terms of illumination (shadow, morning, evening, full sun, cloudy, rain, ...) The ground truth is defined for each images with polygons around leafs boundaries: In addition, each polygons are labeled into crop or weed. (2020-06-11)

0 papers0 benchmarksImages

Simulacra Aesthetic Captions

Simulacra Aesthetic Captions is a dataset of over 238000 synthetic images generated with AI models such as CompVis latent GLIDE and Stable Diffusion from over forty thousand user submitted prompts. The images are rated on their aesthetic value from 1 to 10 by users to create caption, image, and rating triplets. In addition to this each user agreed to release all of their work with the bot: prompts, outputs, ratings, completely public domain under the CC0 1.0 Universal Public Domain Dedication. The result is a high quality royalty free dataset with over 176000 ratings that can be used for projects such as:

0 papers0 benchmarksImages

AiTLAS: Benchmark Arena

AiTLAS: Benchmark Arena is an open-source benchmark framework for evaluating state-of-the-art deep learning approaches for image classification in Earth Observation (EO).

0 papers0 benchmarksImages

KID-F (K-pop Idol Dataset - Female)

Description K-pop Idol Dataset - Female (KID-F) is the first dataset of K-pop idol high quality face images. It consists of about 6,000 high quality face images at 512x512 resolution and identity labels for each image.

0 papers0 benchmarksImages

Semeion (Semeion Handwritten Digit Data Set)

1593 handwritten digits from around 80 persons were scanned, stretched in a rectangular box 16x16 in a gray scale of 256 values.

0 papers0 benchmarksImages

Infinity Spills Basic Dataset

Infinity AI's Spills Basic Dataset is a synthetic, open-source dataset for safety applications. It features 150 videos of photorealistic liquid spills across 15 common settings. Spills take on in-context reflections, caustics, and depth based on the surrounding environment, lighting, and floor. Each video contains a spill of unique properties (size, color, profile, and more) and is accompanied by pixel-perfect labels and annotations. This dataset can be used to develop computer vision algorithms to detect the location and type of spill from the perspective of a fixed camera.

0 papers0 benchmarksImages, RGB Video, Videos

Penguin dataset

The penguin dataset is a collection of images of penguin colonies in Antarctica coming from the larger penguin watch project, which was setup with the purpose of monitoring their changes in population. The images are taken by fixed cameras in over 40 different locations, which have been capturing an image per hour for several years. In order to track the colony sizes, the number of penguins in each of the images in the dataset is required. So far, the penguin count has been done with the help of citizen scientists on the Penguin Watch site by Zooniverse, where interested users can place dots on top of the penguins. Here we release part of this data to the vision community in order to learn from the crowd-sourced dot-annotations to automatically annotate these images.

0 papers0 benchmarksImages

TFW: Annotated Thermal Faces in the Wild Dataset

Face detection and subsequent localization of facial landmarks are the primary steps in many face applications. Numerous algorithms and benchmark datasets have been introduced to develop robust models for the visible domain. However, varying conditions of illumination still pose challenging problems. In this regard, thermal cameras are employed to address this problem, because they operate on longer wavelengths. However, thermal face and facial landmark detection in the wild is an open research problem because most of the existing thermal datasets were collected in controlled environments. In addition, many of them were not annotated with face bounding boxes and facial landmarks. In this work, we present a thermal face dataset with manually labeled bounding boxes and facial landmarks to address these problems. The dataset contains 9,982 images of 147 subjects collected under controlled and uncontrolled conditions. As a baseline, we trained the YOLOv5 object detection model and its adap

0 papers0 benchmarksImages

SF-TL54: A Thermal Facial Landmark Dataset with Visual Pairs

Facial landmark detection is a cornerstone in many facial analysis tasks such as face recognition, drowsiness detection, and facial expression recognition. Numerous methodologies were introduced to achieve accurate and efficient facial landmark localization in visual images. However, there are only several works that address facial landmark detection in thermal images. The main challenge is the limited number of annotated datasets. In this work, we present a thermal face dataset with annotated face bounding boxes and facial landmarks. The dataset contains 2,556 thermal images of 142 individuals, where each thermal image is paired with the corresponding visual image. To the best of our knowledge, our dataset is the largest in terms of the number of individuals. In addition, our dataset can be employed for tasks such as thermal-to-visual image translation, thermal-visual face recognition, and others. We trained two models for the facial landmark detection task to show the efficacy of our

0 papers0 benchmarksImages

RTI Rwanda Drone Crop Types (Drone Imagery Classification Training Dataset for Crop Types in Rwanda)

RTI International (RTI) generated 2,611 labeled point locations representing 19 different land cover types, clustered in 5 distinct agroecological zones within Rwanda. These land cover types were reduced to three crop types (Banana, Maize, and Legume), two additional non-crop land cover types (Forest and Structure), and a catch-all Other land cover type to provide training/evaluation data for a crop classification model. Each point is attributed with its latitude and longitude, the land cover type, and the degree of confidence the labeler had when classifying the point location. For each location there are also three corresponding image chips (4.5 m x 4.5 m in size) with the point id as part of the image name. Each image contains a P1, P2, or P3 designation in the name, indicating the time period. P1 corresponds to December 2018, P2 corresponds to January 2019, and P3 corresponds to February 2019. These data were used in the development of research documented in greater detail in “Deep

0 papers0 benchmarksImages

CVGL

CVGL Camera Calibration Dataset consists of 49 camera configurations with town 1 having 25 configurations while town 2 having 24 configurations. The parameters modified for generating the configurations include fov, x, y, z, pitch, yaw, and roll. Here, fov is the field of view, (x, y, z) is the translation while (pitch, yaw, and roll) is the rotation between the cameras. The total number of image pairs is 79, 320, out of which 18, 083 belong to Town 1 while 61, 237 belong to Town 2, the difference in the number of images is due to the length of the tracks.

0 papers0 benchmarksImages

JigsawPlan

JigsawPlan contains room layouts and floorplans for 98,780 single-story houses/apartments from a production pipeline, designed for the Extreme Structure from Motion (E-SfM) problem.

0 papers0 benchmarksImages

MapAI Dataset

MapAI: Precision in Building Segmentation Dataset The dataset comprises 7500 training images and 1500 validation images from Denmark. The test dataset is split into two tasks, where the first task (1368 images) is to segment the buildings only using aerial images. In contrast, the second task (978 images) allows using aerial images and lidar data. All data samples have a resolution of 500x500. The aerial images are RGB images, while the lidar data are rasterized. The ground truth masks have two classes, building, and background.

0 papers0 benchmarksImages, LiDAR

GNMC (Gracenote Multi-Crop Dataset)

We present the Gracenote Multi-Crop (GNMC) dataset, to further research in algorithms for aesthetic image cropping. The dataset consists of a diverse collection of 10K images, each cropped in five different aspect ratios by experienced editors. GNMC is larger than existing datasets commonly used to benchmark image cropping approaches such as FCDB (1743 images) and FLMS (500 images). This dataset can enable aesthetic cropping algorithms as described in "An Experience-Based Direct Generation Approach to Automatic Image Cropping" by Christensen and Vartakavi.

0 papers0 benchmarksImages

Compositional Visual Reasoning (CVR)

A fundamental component of human vision is our ability to parse complex visual scenes and judge the relations between their constituent objects. AI benchmarks for visual reasoning have driven rapid progress in recent years with state-of-the-art systems now reaching human accuracy on some of these benchmarks. Yet, there remains a major gap between humans and AI systems in terms of the sample efficiency with which they learn new visual reasoning tasks. Humans' remarkable efficiency at learning has been at least partially attributed to their ability to harness compositionality -- allowing them to efficiently take advantage of previously gained knowledge when learning new tasks. Here, we introduce a novel visual reasoning benchmark, Compositional Visual Relations (CVR), to drive progress towards the development of more data-efficient learning algorithms. We take inspiration from fluidic intelligence and non-verbal reasoning tests and describe a novel method for creating compositions of abs

0 papers0 benchmarksGraphs, Images

KAIST multi-spectral Day/Night 2018

We introduce the KAIST multi-spectral dataset, which covers a greater range of drivable regions, from urban to residential, for autonomous systems. Our dataset provides different perspectives of the world captured in coarse time slots (day and night) in addition to fine time slots (sunrise, morning, afternoon, sunset, night and dawn). For all-day perception of autonomous systems, we propose the use of a different spectral sensor, i.e., a thermal imaging camera. Toward this goal, we develop a multi-sensor platform, which supports the use of a co-aligned RGB/Thermal camera, RGB stereo, 3D LiDAR and inertial sensors (GPS/IMU) and a related calibration technique. We design a wide range of visual perception tasks including the object detection, drivable region detection, localization, image enhancement, depth estimation and colorization using a single/multi-spectral approach. In this paper, we provide a description of our benchmark with the recording platform, data format, development toolk

0 papers0 benchmarksImages, LiDAR, Stereo

NTLNP (wildlife image dataset)

This is an image dataset for object detection of wildlife in the mixed coniferous broad-leaved forest.

0 papers0 benchmarksImages

SESYD Dataset (Systems Evaluation SYnthetic Documents)

SESYD "Systems Evaluation SYnthetic Documents" is a database of synthetical documents with groundtruth. This database targets two main research problems in the document image analysis field (i) symbol recognition and spotting in line drawing images (floorplans and electrical diagrams) (ii) character segmentation and recognition in geographical maps. The database is composed of eleven collections for performance evaluation containing 284k images, 190k symbols and 284k characters (k for thousand). SESYD is today a key database in the document image analysis field published in 2010 and referred by one hundred of citations into research papers.

0 papers0 benchmarksImages
PreviousPage 159 of 164Next