3,275 machine learning datasets
3,275 dataset results
DELAUNAY is a dataset of abstract paintings and non-figurative art objects labelled by the artists' names. This dataset provides a middle ground between natural images and artificial patterns and can thus be used in a variety of contexts, for example to investigate the sample efficiency of humans and artificial neural networks.
This Dataset consists of 2120 sequences of binary masks of pedestrians. The sequence length varies between 2-710. For details, we refer to our paper. It is based on the original KITTI Segmentation challenge which can be found at https://www.vision.rwth-aachen.de/page/mots
PETRAW data set was composed of 150 sequences of peg transfer training sessions. The objective of the peg transfer session is to transfer 6 blocks from the left to the right and back. Each block must be extracted from a peg with one hand, transferred to the other hand, and inserted in a peg at the other side of the board. All cases were acquired by a non-medical expert on the LTSI Laboratory from the University of Rennes. The data set was divided into a training data set composed of 90 cases and a test data set composed of 60 cases. A case was composed of kinematic data, a video, semantic segmentation of each frame, and workflow annotation.
Contains squared blocks of 48×48 pixels including 13 Sentinel-2 bands. Each 480-m block was mined from a large geographical area of interest (102 km × 42 km) located north of Munich, Germany.
The MCVQA dataset consists of 248, 349 training questions and 121, 512 validation questions for real images in Hindi and Code-mixed. For each Hindi question, we also provide its 10 corresponding answers in Hindi.
We have cleaned the noisy IMDB-WIKI dataset using a constrained clustering method, resulting this new benchmark for in-the-wild age estimation. The annotations also allow this dataset to use for some other tasks, like gender classification and face recognition/verification. For more details, please refer to our FPAge paper.
Largest, first-of-its-kind, in-the-wild, fine-grained workout/exercise posture analysis dataset, covering three different exercises: BackSquat, Barbell Row, and Overhead Press. Seven different types of exercise errors are covered. Unlabeled data is also provided to facilitate self-supervised learning.
The NVIDIA HOPE datasets consist of RGBD images and video sequences with labeled 6-DoF poses for 28 toy grocery objects. The toy grocery objects are readily available for purchase and have ideal size and weight for robotic manipulation. 3D textured meshes for generating synthetic training data are provided.
PACS (Physical Audiovisual CommonSense) is the first audiovisual benchmark annotated for physical commonsense attributes. PACS contains a total of 13,400 question-answer pairs, involving 1,377 unique physical commonsense questions and 1,526 videos. The dataset provides new opportunities to advance the research field of physical reasoning by bringing audio as a core component of this multimodal problem.
NKL (short for NanKai Lines) is a dataset for semantic line detection. Semantic lines are meaningful line structures that outline the conceptual structure of natural images. The NKL dataset contains 5,000 images of various scenes. Each of these images is annotated by multiple skilled human annotators. The dataset is split into training and validation subsets. There are 4,000 images in the training set and 1,000 in the validation set.
For a detailed description, we refer to Section 3 in our research article.
GFP-GOWT1 mouse stem cells
This is the first image-based network intrusion detection dataset. This large-scale dataset included network traffic protocol communication-based images from 15 different observation locations of different countries in Asia. This dataset is used to identify two different types of anomalies from benign network traffic. Each image with a size of 48 × 48 contains multi-protocol communications within 128 seconds. The SIDD dataset can be to applied to a broad range of tasks such as machine learning-based network intrusion detection, non-iid federated learning, and so forth.
Detecting out-of-context media, such as "mis-captioned" images on Twitter, is a relevant problem, especially in domains of high public significance. Twitter-COMMs is a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles. This dataset can be used to develop methods to detect misinformation on social media platforms related to these three topics.
Hephaestus is the first large-scale InSAR dataset. Driven by volcanic unrest detection, it provides 19,919 unique satellite frames annotated with a diverse set of labels. Moreover, each sample is accompanied by a textual description of its contents. The goal of this dataset is to boost research on exploitation of interferometric data enabling the application of state-of-the-art computer vision+NLP methods. Furthermore, the annotated dataset is bundled with a large archive of unlabeled frames to enable large-scale self-supervised learning. The final size of the dataset amounts to 110,573 interferograms.
This dataset contains images of individual hand-written Bengali characters. Bengali characters (graphemes) are written by combining three components: a grapheme_root, vowel_diacritic, and consonant_diacritic. Your challenge is to classify the components of the grapheme in each image. There are roughly 10,000 possible graphemes, of which roughly 1,000 are represented in the training set. The test set includes some graphemes that do not exist in the train but has no new grapheme components. It takes a lot of volunteers filling out sheets like this to generate a useful amount of real data; focusing the problem on the grapheme components rather than on recognizing whole graphemes should make it possible to assemble a Bengali OCR system without handwriting samples for all 10,000 graphemes.
Modeling what makes an advertisement persuasive, i.e., eliciting the desired response from consumer, is critical to the study of propaganda, social psychology, and marketing. Despite its importance, computational modeling of persuasion in computer vision is still in its infancy, primarily due to the lack of benchmark datasets that can provide persuasion-strategy labels associated with ads. Motivated by persuasion literature in social psychology and marketing, we introduce an extensive vocabulary of persuasion strategies and build the first ad image corpus annotated with persuasion strategies. The dataset also provides image segmentation masks, which labels persuasion strategies in the corresponding ad images on the test split.
We present a new large-scale photorealistic panoramic dataset named FutureHouse, which has the following characteristics. 1) It contains over 70,000 high-quality models with high-resolution meshes and physical material. All models are measured in real world standards. 2) Selected scene layouts are carefully designed by over 100 excellent artists. All of selected layouts are used in realworld display. 3) It contains 28,579 good panoramic views from 1,752 house-scale scenes. Therefore, it can be used for perspective image tasks as well as omnidirectional image tasks. 4) More physical material representation. Most materials are represent by microfacet BRDF modeling metalness, and the rest are represent by special shading models, e.g., cloth material and transmission material. 5) High rendering quality. Benefiting from commercial rendering engine, Unreal engine 4, and powerful deep learning super sampling (DLSS), our renderings have less noise. Our SVBRDF rep
An autnonomous driving dataset and benchmark for optical flow. This dataset was created by the Heidelberg Collaboratory for Image Processing in close cooperation with Robert Bosch GmbH.
Synthehicle is a massive CARLA-based synthehic multi-vehicle multi-camera tracking dataset and includes ground truth for 2D detection and tracking, 3D detection and tracking, depth estimation, and semantic, instance and panoptic segmentation.