3,275 machine learning datasets
3,275 dataset results
The M5Product dataset is a large-scale multi-modal pre-training dataset with coarse and fine-grained annotations for E-products.
Bentham manuscripts refers to a large set of documents that were written by the renowned English philosopher and reformer Jeremy Bentham (1748-1832). Volunteers of the Transcribe Bentham initiative transcribed this collection. Currently, >6 000 documents or > 25 000 pages have been transcribed using this public web platform. For our experiments, we used the BenthamR0 dataset a part of the Bentham manuscripts.
XTD10 is a dataset for cross-lingual image retrieval and tagging consisting of the MSCOCO2014 caption test dataset annotated in 7 languages that were collected using a crowdsourcing platform.
iShape is an irregular shape dataset for instance segmentation. iShape contains six sub-datasets with one real and five synthetics, each represents a scene of a typical irregular shape.
MovingFashion is a dataset for video-to-shop, the task of retrieving clothes which are worn in social media videos. MovingFashion is composed of 14,855 social videos, each one of them associated with e-commerce "shop" images where the corresponding clothing items are clearly portrayed.
The eBDtheque database is a selection of one hundred comic pages from America, Japan (manga) and Europe.
The DCM dataset is composed of 772 annotated images from 27 golden age comic books. We freely collected them from the free public domain collection of digitized comic books Digital Comics Museum. One album per available publisher was selected to get as many different styles as possible. We made ground-truth bounding boxes of all panels, all characters (body + faces), small or big, human-like or animal-like.
SYSU-MM01-C is an evaluation set that consists of algorithmically generated corruptions applied to the SYSU-MM01 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
RegDB-C is an evaluation set that consists of algorithmically generated corruptions applied to the RegDB test-set (color images). These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
Articulated Mesh Animation (AMA) is a real-world dataset containing 10 mesh sequences depicting 3 different humans performing various actions
In recent years, person detection and human pose estimation have made great strides, helped by large-scale labeled datasets. However, these datasets had no guarantees or analysis of human activities, poses, or context diversity. Additionally, privacy, legal, safety, and ethical concerns may limit the ability to collect more human data. An emerging alternative to real-world data that alleviates some of these issues is synthetic data. However, creation of synthetic data generators is incredibly challenging and prevents researchers from exploring their usefulness. Therefore, we release a human-centric synthetic data generator PeopleSansPeople which contains simulation-ready 3D human assets, a parameterized lighting and camera system, and generates 2D and 3D bounding box, instance and semantic segmentation, and COCO pose labels. Using PeopleSansPeople, we performed benchmark synthetic data training using a Detectron2 Keypoint R-CNN variant [1]. We found that pre-training a network using sy
FACTIFY is a dataset on multi-modal fact verification. It contains images, textual claim, reference textual documenta and image. The task is to classify the claims into support, not-enough-evidence and refute categories with the help of the supporting data. We aim to combat fake news in the social media era by providing this multi-modal dataset. Factify contains 50,000 claims accompanied with 100,000 images, split into training, validation and test sets.
N-Omniglot is a neuromorphic dataset for few-shot learning. It contains 1,623 categories of handwritten characters, with only 20 samples per class.
Incidents1M is a large-scale multi-label dataset for incident detection which contains 977,088 images, with 43 incident and 49 place categories. It is an evolution of the Incidents dataset that doubles the dataset size and includes more incident labels.
CUB-GHA is a dataset for fine-grained classification with human attention annotations. The dataset collects human gaze data for the fine-grained classification dataset CUB and builds a dataset named CUB-GHA (Gaze-based Human Attention).
DABS is a domain-agnostic benchmark for self-supervised learning to encourage research and progress towards domain-agnostic methods.
The VizWiz-VQA-Grounding dataset is a dataset that visually grounds answers to visual questions asked by people with visual impairments.
The TimberSeg 1.0 dataset is composed of 220 images showing wood logs in various environments and conditions in Canada. The images are densely annotated with segmentation masks for each log instance, as well as the corresponding bounding box and class label. This dataset aim towards enabling autonomous forestry forwarders, therefore it contains nearly 2500 instances of wood logs from an operators' point-of-view. Images were taken in the forest, near the roadside, in lumberyards and above timber-filled trailers. The logs were annotated considering a grasping perspective, meaning that only the logs above the piles and accessible are segmented.
Relative Human (RH) contains multi-person in-the-wild RGB images with rich human annotations, including:
Visuelle 2.0 is a dataset containing real data for 5355 clothing products of the retail fast-fashion Italian company, Nuna Lie. Specifically, Visuelle 2.0 provides data from 6 fashion seasons (partitioned in Autumn-Winter and Spring-Summer) from 2017-2019, right before the Covid-19 pandemic. Each product is accompanied by an HD image, textual tags and more. The time series data are disaggregated at the shop level, and include the sales, inventory stock, max-normalized prices (for the sake of confidentiality} and discounts. Exogenous time series data is also provided, in the form of Google Trends based on the textual tags and multivariate weather conditions of the stores’ locations. Finally, we also provide purchase data for 667K customers whose identity has been anonymized, to capture personal preferences. With these data, Visuelle 2.0 allows to cope with several problems which characterize the activity of a fast fashion company: new product demand forecasting, short-observation new pr