Datasets

3,275 machine learning datasets

3,275 dataset results

M5Product

The M5Product dataset is a large-scale multi-modal pre-training dataset with coarse and fine-grained annotations for E-products.

4 papers0 benchmarksAudio, Images, Tables, Texts, Videos

Bentham manuscripts refers to a large set of documents that were written by the renowned English philosopher and reformer Jeremy Bentham (1748-1832). Volunteers of the Transcribe Bentham initiative transcribed this collection. Currently, >6 000 documents or > 25 000 pages have been transcribed using this public web platform. For our experiments, we used the BenthamR0 dataset a part of the Bentham manuscripts.

4 papers2 benchmarksImages, Texts

XTD10

XTD10 is a dataset for cross-lingual image retrieval and tagging consisting of the MSCOCO2014 caption test dataset annotated in 7 languages that were collected using a crowdsourcing platform.

4 papers0 benchmarksImages, Texts

iShape

iShape is an irregular shape dataset for instance segmentation. iShape contains six sub-datasets with one real and five synthetics, each represents a scene of a typical irregular shape.

4 papers1 benchmarksImages

MovingFashion

MovingFashion is a dataset for video-to-shop, the task of retrieving clothes which are worn in social media videos. MovingFashion is composed of 14,855 social videos, each one of them associated with e-commerce "shop" images where the corresponding clothing items are clearly portrayed.

4 papers1 benchmarksImages, Videos

eBDtheque

The eBDtheque database is a selection of one hundred comic pages from America, Japan (manga) and Europe.

4 papers8 benchmarksImages

DCM

The DCM dataset is composed of 772 annotated images from 27 golden age comic books. We freely collected them from the free public domain collection of digitized comic books Digital Comics Museum. One album per available publisher was selected to get as many different styles as possible. We made ground-truth bounding boxes of all panels, all characters (body + faces), small or big, human-like or animal-like.

4 papers18 benchmarksImages

SYSU-MM01-C

SYSU-MM01-C is an evaluation set that consists of algorithmically generated corruptions applied to the SYSU-MM01 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.

4 papers12 benchmarksImages

RegDB-C

RegDB-C is an evaluation set that consists of algorithmically generated corruptions applied to the RegDB test-set (color images). These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.

4 papers0 benchmarksImages

AMA (Articulated Mesh Animation)

Articulated Mesh Animation (AMA) is a real-world dataset containing 10 mesh sequences depicting 3 different humans performing various actions

4 papers0 benchmarks3d meshes, Images

PeopleSansPeople (PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision)

In recent years, person detection and human pose estimation have made great strides, helped by large-scale labeled datasets. However, these datasets had no guarantees or analysis of human activities, poses, or context diversity. Additionally, privacy, legal, safety, and ethical concerns may limit the ability to collect more human data. An emerging alternative to real-world data that alleviates some of these issues is synthetic data. However, creation of synthetic data generators is incredibly challenging and prevents researchers from exploring their usefulness. Therefore, we release a human-centric synthetic data generator PeopleSansPeople which contains simulation-ready 3D human assets, a parameterized lighting and camera system, and generates 2D and 3D bounding box, instance and semantic segmentation, and COCO pose labels. Using PeopleSansPeople, we performed benchmark synthetic data training using a Detectron2 Keypoint R-CNN variant [1]. We found that pre-training a network using sy

4 papers0 benchmarksImages

FACTIFY (a dataset on multi-modal fact verification)

FACTIFY is a dataset on multi-modal fact verification. It contains images, textual claim, reference textual documenta and image. The task is to classify the claims into support, not-enough-evidence and refute categories with the help of the supporting data. We aim to combat fake news in the social media era by providing this multi-modal dataset. Factify contains 50,000 claims accompanied with 100,000 images, split into training, validation and test sets.

4 papers0 benchmarksImages, Texts

N-Omniglot

N-Omniglot is a neuromorphic dataset for few-shot learning. It contains 1,623 categories of handwritten characters, with only 20 samples per class.

4 papers0 benchmarksImages

Incidents1M

Incidents1M is a large-scale multi-label dataset for incident detection which contains 977,088 images, with 43 incident and 49 place categories. It is an evolution of the Incidents dataset that doubles the dataset size and includes more incident labels.

4 papers0 benchmarksImages

CUB-GHA (CUB Gaze-based Human Attention)

CUB-GHA is a dataset for fine-grained classification with human attention annotations. The dataset collects human gaze data for the fine-grained classification dataset CUB and builds a dataset named CUB-GHA (Gaze-based Human Attention).

4 papers0 benchmarksImages

DABS (Domain-Agnostic Benchmark for Self-supervised learning)

DABS is a domain-agnostic benchmark for self-supervised learning to encourage research and progress towards domain-agnostic methods.

4 papers6 benchmarksImages

VizWiz-VQA-Grounding

The VizWiz-VQA-Grounding dataset is a dataset that visually grounds answers to visual questions asked by people with visual impairments.

4 papers0 benchmarksImages, Texts

TimberSeg 1.0

The TimberSeg 1.0 dataset is composed of 220 images showing wood logs in various environments and conditions in Canada. The images are densely annotated with segmentation masks for each log instance, as well as the corresponding bounding box and class label. This dataset aim towards enabling autonomous forestry forwarders, therefore it contains nearly 2500 instances of wood logs from an operators' point-of-view. Images were taken in the forest, near the roadside, in lumberyards and above timber-filled trailers. The logs were annotated considering a grasping perspective, meaning that only the logs above the piles and accessible are segmented.

4 papers0 benchmarksImages

Relative Human

Relative Human (RH) contains multi-person in-the-wild RGB images with rich human annotations, including:

4 papers22 benchmarksImages

VISUELLE2.0

Visuelle 2.0 is a dataset containing real data for 5355 clothing products of the retail fast-fashion Italian company, Nuna Lie. Specifically, Visuelle 2.0 provides data from 6 fashion seasons (partitioned in Autumn-Winter and Spring-Summer) from 2017-2019, right before the Covid-19 pandemic. Each product is accompanied by an HD image, textual tags and more. The time series data are disaggregated at the shop level, and include the sales, inventory stock, max-normalized prices (for the sake of confidentiality} and discounts. Exogenous time series data is also provided, in the form of Google Trends based on the textual tags and multivariate weather conditions of the stores’ locations. Finally, we also provide purchase data for 667K customers whose identity has been anonymized, to capture personal preferences. With these data, Visuelle 2.0 allows to cope with several problems which characterize the activity of a fast fashion company: new product demand forecasting, short-observation new pr

4 papers4 benchmarksImages, Texts, Time series

PreviousPage 76 of 164Next

Datasets

M5Product

Bentham (Bentham project)

XTD10

iShape

MovingFashion

eBDtheque

DCM

SYSU-MM01-C

RegDB-C

AMA (Articulated Mesh Animation)

PeopleSansPeople (PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision)

FACTIFY (a dataset on multi-modal fact verification)

N-Omniglot

Incidents1M

CUB-GHA (CUB Gaze-based Human Attention)

DABS (Domain-Agnostic Benchmark for Self-supervised learning)

VizWiz-VQA-Grounding

TimberSeg 1.0

Relative Human

VISUELLE2.0

Datasets

M5Product

Bentham (Bentham project)

XTD10

iShape

MovingFashion

eBDtheque

DCM

SYSU-MM01-C

RegDB-C

AMA (Articulated Mesh Animation)

PeopleSansPeople (PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision)

FACTIFY (a dataset on multi-modal fact verification)

N-Omniglot

Incidents1M

CUB-GHA (CUB Gaze-based Human Attention)

DABS (Domain-Agnostic Benchmark for Self-supervised learning)

VizWiz-VQA-Grounding

TimberSeg 1.0

Relative Human

VISUELLE2.0