Datasets

3,275 machine learning datasets

3,275 dataset results

Schwerin

Schwerin contains handwritten texts written in medieval German. Train sample consists of 793 lines, validation - 68 lines and test - 196 lines.

3 papers0 benchmarksImages, Texts

SketchHairSalon

SketchHairSalon is a dataset for hair generation containing thousands of annotated hair sketch-image pairs and corresponding hair mattes.

3 papers0 benchmarksImages

FOD in Airports (FOD-A) is an image dataset of FOD, Foreign Object Degris, which consists of 31 object categories and over 30,000 annotation instances. The object categories have been selected based on guidance from prior documentation and related research by the Federal Aviation Administration (FAA).

3 papers0 benchmarksImages

KOHTD (Kazakh Offline Handwritten Text Dataset)

Kazakh offline Handwritten Text dataset (KOHTD) has 3000 handwritten exam papers and more than 140335 segmented images and there are approximately 922010 symbols. It can serve researchers in the field of handwriting recognition tasks by using deep and machine learning.

3 papers2 benchmarksImages

DEIC Benchmark (Data-Efficient Image Classification Benchmark)

DEIC is a benchmark for measuring the data efficiency of models in the context of image classification. It is composed of 6 datasets that contain a small number of training samples per class (i.e., 30 < x < 80). It covers multiple image domains (i.e., natural images, fine-grained recognition, medical images, remote sensing, handwriting recognition) and data types (i.e., RGB, grayscale, multi-spectral).

3 papers1 benchmarksImages

Coveo Data Challenge Dataset

The 2021 SIGIR workshop on eCommerce is hosting the Coveo Data Challenge for "In-session prediction for purchase intent and recommendations". The challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers. We release a new session-based dataset including more than 30M fine-grained browsing events (product detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (images, text, pricing information). On this dataset, we ask participants to showcase innovative solutions fo

3 papers4 benchmarksEnvironment, Images, Texts

CoVA (CoVA dataset for Webpage Object Detection / Information Extraction)

We labeled 7,740 webpage screenshots spanning 408 domains (Amazon, Walmart, Target, etc.). Each of these webpages contains exactly one labeled price, title, and image. All other web elements are labeled as background. On average, there are 90 web elements in a webpage.

3 papers0 benchmarksImages

FEAFA+

FEAFA+ is a dataset for Facial expression analysis and 3D Facial animation. It includes 150 video sequences from FEAFA and DISFA, with a total of 230,184 frames being manually annotated on floating-point intensity value of 24 redefined AUs using the Expression Quantitative Tool.

3 papers0 benchmarksImages

AdobeVFR syn (Adobe Visual Font Recognition synthetic dataset)

Subset of AdobeVFR. The dataset contains images depicting English text and consists of 1000 synthetic images for training and 100 for testing, for each of 2383 font classes. The training and test sets are called VFR_syn_train and VFR_syn_val, respectively.

3 papers4 benchmarksImages

AdobeVFR real (Adobe Visual Font Recognition real-world images dataset)

Subset of AdobeVFR. The dataset contains "real-world text images".

3 papers4 benchmarksImages

GPR1200

Most publications that aim to optimize neural networks for CBIR, train and test their models on domain specific datasets. It is therefore unclear, if those networks can be used as a general-purpose image feature extractor. After analyzing popular image retrieval test sets we decided to manually curate GPR1200, an easy to use and accessible but challenging benchmark dataset with 1200 categories and 10 class examples. Classes and images were manually selected from six publicly available datasets of different image areas, ensuring high class diversity and clean class boundaries.

3 papers0 benchmarksImages

fluocells (Fluorescent Neuronal Cells)

By releasing this dataset, we aim at providing a new testbed for computer vision techniques using Deep Learning. The main peculiarity is the shift from the domain of "natural images" proper of common benchmark dataset to biological imaging. We anticipate that the advantages of doing so could be two-fold: i) fostering research in biomedical-related fields - for which popular pre-trained models perform typically poorly - and ii) promoting methodological research in deep learning by addressing peculiar requirements of these images. Possible applications include but are not limited to semantic segmentation, object detection and object counting. The data consist of 283 high-resolution pictures (1600x1200 pixels) of mice brain slices acquired through a fluorescence microscope. The final goal is to individuate and count neurons highlighted in the pictures by means of a marker, so to assess the result of a biological experiment. The corresponding ground-truth labels were generated through a hy

3 papers0 benchmarksBiology, Biomedical, Images

Aircraft Context Dataset

The Aircraft Context Dataset, a composition of two inter-compatible large-scale and versatile image datasets focusing on manned aircraft and UAVs, is intended for training and evaluating classification, detection and segmentation models in aerial domains. Additionally, a set of relevant meta-parameters can be used to quantify dataset variability as well as the impact of environmental conditions on model performance.

3 papers0 benchmarksImages

QST (Quick Sky Time)

QST contains 1,167 video clips that are cut out from 216 time-lapse 4K videos collected from YouTube, which can be used for a variety of tasks, such as (high-resolution) video generation, (high-resolution) video prediction, (high-resolution) image generation, texture generation, image inpainting, image/video super-resolution, image/video colorization, image/video animating, etc. Each short clip contains multiple frames (from a minimum of 58 frames to a maximum of 1,200 frames, a total of 285,446 frames), and the resolution of each frame is more than 1,024 x 1,024. Specifically, QST consists of a training set (containing 1000 clips, totally 244,930 frames), a validation set (containing 100 clips, totally 23,200 frames), and a testing set (containing 67 clips, totally 17,316 frames). Click here (Key: qst1) to download the QST dataset.

3 papers0 benchmarksImages, Videos

HSPACE (Human-SPACE)

HSPACE (Human-SPACE) is a large-scale photo-realistic dataset of animated humans placed in complex synthetic indoor and outdoor environments. For all frames the dataset provides 3d pose and shape ground truth, as well as other rich image annotations including human segmentation, body part localisation semantics, and temporal correspondences.

3 papers16 benchmarksImages

LOOK

LOOK is a large-scale dataset for eye contact detection in the wild, which focuses on diverse and unconstrained scenarios for real-world generalization. The dataset focuses on real-world scenarios for autonomous vehicles with no control over the environment or the distance of pedestrians

3 papers0 benchmarksImages

NOD (Night Object Detection)

This is a high-quality large-scale Night Object Detection (NOD) dataset of outdoor images targeting low-light object detection. The dataset contains more than 7K images and 46K annotated objects (with bounding boxes) that belong to classes: person, bicycle, and car. The photos were taken on the streets at evening hours, and thus all images present low-light conditions to a varying degree of severity.

3 papers0 benchmarksImages

FFHQ-Text

FFHQ-Text is a small-scale face image dataset with large-scale facial attributes, designed for text-to-face generation & manipulation, text-guided facial image manipulation, and other vision-related tasks. This dataset is an extension of the NVIDIA Flickr-Faces-HQ Dataset (FFHQ), which is the selected top 760 female FFHQ images that only contain one complete human face.

3 papers0 benchmarksImages, Texts

PP-HumanSeg14K

A large-scale video portrait dataset that contains 291 videos from 23 conference scenes with 14K frames. This dataset contains various teleconferencing scenes, various actions of the participants, interference of passers-by and illumination change.

3 papers0 benchmarksImages, Videos

CVGL Camera Calibration Dataset

The dataset has been generated using Town 1 and Town 2 of CARLA Simulator. The dataset consists of 50 camera configurations with each town having 25 configurations. The parameters modified for generating the configurations include f ov, x, y, z, pitch, yaw, and roll. Here, f ov is the field of view, (x, y, z) is the translation while (pitch, yaw, and roll) is the rotation between the cameras. The total number of image pairs is 1,23,017, out of which 58,596 belong to Town 1 while 64,421 belong to Town 2, the difference in the number of images is due to the length of the tracks.

3 papers0 benchmarksImages, Stereo

PreviousPage 85 of 164Next