3,275 machine learning datasets
3,275 dataset results
The NINCO (No ImageNet Class Objects) dataset is introduced in the ICML 2023 paper In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation. The images in this dataset are free from objects that belong to any of the 1000 classes of ImageNet-1K (ILSVRC2012), which makes NINCO suitable for evaluating out-of-distribution detection on ImageNet-1K .
Kitsune Network Attack Dataset This is a collection of nine network attack datasets captured from a either an IP-based commercial surveillance system or a network full of IoT devices. Each dataset contains millions of network packets and diffrent cyber attack within it.
CHOCOLATE is a benchmark for detecting and correcting factual inconsistency in generated chart captions. It consists of captions produced by six advanced models, which are categorized into three subsets:
For each dataset we provide a short description as well as some characterization metrics. It includes the number of instances (m), number of attributes (d), number of labels (q), cardinality (Card), density (Dens), diversity (Div), average Imbalance Ratio per label (avgIR), ratio of unconditionally dependent label pairs by chi-square test (rDep) and complexity, defined as m × q × d as in [Read 2010]. Cardinality measures the average number of labels associated with each instance, and density is defined as cardinality divided by the number of labels. Diversity represents the percentage of labelsets present in the dataset divided by the number of possible labelsets. The avgIR measures the average degree of imbalance of all labels, the greater avgIR, the greater the imbalance of the dataset. Finally, rDep measures the proportion of pairs of labels that are dependent at 99% confidence. A broader description of all the characterization metrics and the used partition methods are described in
The Polaris dataset offers a large-scale, diverse benchmark for evaluating metrics for image captioning, surpassing existing datasets in terms of size, caption diversity, number of human judgments, and granularity of the evaluations. It includes 131,020 generated captions and 262,040 reference captions. The generated captions have a vocabulary of 3,154 unique words and the reference captions have a vocabulary of 22,275 unique words.
LLM-Seg40K dataset contains 14K images in total. The dataset is divided into training, validation, and test sets, containing 11K, 1K, and 2K images respectively. For the training split, each image has 3.95 questions on average and the average question question length is 15.2 words. The training set contains 1458 different categories in total.
In recent years, visual question answering (VQA) has attracted attention from the research community because of its highly potential applications (such as virtual assistance on intelligent cars, assistant devices for blind people, or information retrieval from document images using natural language as queries) and challenge. The VQA task requires methods that have the ability to fuse the information from questions and images to produce appropriate answers. Neural visual question answering models have achieved tremendous growth on large-scale datasets which are mostly for resource-rich languages such as English. However, available datasets narrow the VQA task as the answers selection task or answer classification task. We argue that this form of VQA is far from human ability and eliminates the challenge of the answering aspect in the VQA task by just selecting answers rather than generating them. In this paper, we introduce the OpenViVQA (Open-domain Vietnamese Visual Question Answering
EVJVQA, the first multilingual Visual Question Answering dataset with three languages: English, Vietnamese, and Japanese, is released in this task. UIT-EVJVQA includes question-answer pairs created by humans on a set of images taken in Vietnam, with the answer created from the input question and the corresponding image. EVJVQA consists of 33,000+ question-answer pairs for evaluating the mQA models.
From my knowledge, the dataset used in the project is the largest crack segmentation dataset so far. It contains around 11.200 images that are merged from 12 available crack segmentation datasets.
MMToM-QA is the first multimodal benchmark to evaluate machine Theory of Mind (ToM), the ability to understand people's minds. MMToM-QA consists of 600 questions. Each question is paired with a clip of the full activity in a video (as RGB-D frames), as well as a text description of the scene and the actions taken by the person in that clip. All questions have two choices. The questions are categorized into seven types, evaluating belief inference and goal inference in rich and diverse situations. Each belief inference type has 100 questions, totaling 300 belief questions; each goal inference type has 75 questions, totaling 300 goal questions. The questions are paired with 134 videos of a person looking for daily objects in household environments.
From PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects: 5.1. Dataset Synthetic dataset. The synthetic 3D models we use for evaluation are from the PartNet-Mobility dataset [49, 27, 4], a large-scale dataset for articulated objects across 46 categories. We select instances across 10 categories to conduct our experiments. For each articulation state, we randomly sample 64-100 views covering the upper hemisphere of the object to simulate capturing in the real world. Then we render RGB images and acquire camera parameters and object masks using Blender [6] to create our training data. Real-world dataset. The real data we use for experiments is from the MultiScan dataset [25], scanning real-world indoor scenes with articulated objects in multiple states. We use the reconstructed mesh of an object in two states as ground truth for evaluation, and the real RGB frames as training data.
We construct the first large-scale dataset, USIS10K, for the underwater salient instance segmentation task, which contains 10,632 images and pixel-level annotations of 7 categories. As far as we know, this is the largest salient instance segmentation dataset, and includes Class-Agnostic and Multi-Class labels simultaneously.
A Large Vision-Language Model Knowledge Editing Benchmark
As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, we present the BIOSCAN-5M Insect dataset to the machine learning community. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by including taxonomic labels, raw nucleotide barcode sequences, assigned barcode index numbers, geographical information, and specimen size.
MUSES offers 2500 multi-modal scenes, evenly distributed across various combinations of weather conditions (clear, fog, rain, and snow) and types of illumination (daytime, nighttime). Each image includes high-quality 2D pixel-level panoptic annotations and class-level and novel instance-level uncertainty annotations. Further, each adverse-condition image has a corresponding image of the same scene taken under clear-weather, daytime conditions. The annotation process for MUSES utilizes all available sensor data, allowing the annotators to also reliably label degraded image regions that are still discernible in other modalities. This results in better pixel coverage in the annotations and creates a more challenging evaluation setup.
LayoutBench-COCO is a diagnostic benchmark that examines layout-guided image generation models on arbitrary, unseen layouts. Unlike LayoutBench, LayoutBench-COCO consists of OOD layouts of real objects and suports zero-shot evaluation. LayoutBench-COCO measures 4 skills (Number, Position, Size, Combination), whose objects are from MS COCO. The new 'combination’ split consists of layouts with two objects in different spatial relations, and the remaining three splits are similar to those of LayoutBench. Download dataset at: https://huggingface.co/datasets/j-min/layoutbench-coco
LayoutBench-COCO is a diagnostic benchmark that examines layout-guided image generation models on arbitrary, unseen layouts. Unlike LayoutBench, LayoutBench-COCO consists of OOD layouts of real objects and suports zero-shot evaluation. LayoutBench-COCO measures 4 skills (Number, Position, Size, Combination), whose objects are from MS COCO. The new 'combination’ split consists of layouts with two objects in different spatial relations, and the remaining three splits are similar to those of LayoutBench. Download dataset at: https://huggingface.co/datasets/j-min/layoutbench-coco
LayoutBench-COCO is a diagnostic benchmark that examines layout-guided image generation models on arbitrary, unseen layouts. Unlike LayoutBench, LayoutBench-COCO consists of OOD layouts of real objects and suports zero-shot evaluation. LayoutBench-COCO measures 4 skills (Number, Position, Size, Combination), whose objects are from MS COCO. The new 'combination’ split consists of layouts with two objects in different spatial relations, and the remaining three splits are similar to those of LayoutBench. Download dataset at: https://huggingface.co/datasets/j-min/layoutbench-coco
LayoutBench-COCO is a diagnostic benchmark that examines layout-guided image generation models on arbitrary, unseen layouts. Unlike LayoutBench, LayoutBench-COCO consists of OOD layouts of real objects and suports zero-shot evaluation. LayoutBench-COCO measures 4 skills (Number, Position, Size, Combination), whose objects are from MS COCO. The new 'combination’ split consists of layouts with two objects in different spatial relations, and the remaining three splits are similar to those of LayoutBench. Download dataset at: https://huggingface.co/datasets/j-min/layoutbench-coco
This dataset is an OSN-transmitted (OSN = Online Social Network) version of the CASIA dataset. The dataset is available here: https://github.com/HighwayWu/ImageForensicsOSN - more specifically: https://drive.google.com/file/d/1uMNZdhX3bYAZNcVGlkCvrnj5lSLW1ld5/view?usp=sharing and was presented in: