Papers With Code 2 | ML Benchmarks, SotA Results & Code

ImageNet-1k vs NINCO (No ImageNet Class Objects)

The NINCO (No ImageNet Class Objects) dataset is introduced in the ICML 2023 paper In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation. The images in this dataset are free from objects that belong to any of the 1000 classes of ImageNet-1K (ILSVRC2012), which makes NINCO suitable for evaluating out-of-distribution detection on ImageNet-1K .

4 papers3 benchmarksImages

Kitsune Network Attack Dataset

Kitsune Network Attack Dataset This is a collection of nine network attack datasets captured from a either an IP-based commercial surveillance system or a network full of IoT devices. Each dataset contains millions of network packets and diffrent cyber attack within it.

4 papers0 benchmarksImages

CHOCOLATE (Captions Have Often ChOsen Lies About The Evidence)

CHOCOLATE is a benchmark for detecting and correcting factual inconsistency in generated chart captions. It consists of captions produced by six advanced models, which are categorized into three subsets:

4 papers1 benchmarksImages, Texts

Multi-Label Classification Dataset Repository

For each dataset we provide a short description as well as some characterization metrics. It includes the number of instances (m), number of attributes (d), number of labels (q), cardinality (Card), density (Dens), diversity (Div), average Imbalance Ratio per label (avgIR), ratio of unconditionally dependent label pairs by chi-square test (rDep) and complexity, defined as m × q × d as in [Read 2010]. Cardinality measures the average number of labels associated with each instance, and density is defined as cardinality divided by the number of labels. Diversity represents the percentage of labelsets present in the dataset divided by the number of possible labelsets. The avgIR measures the average degree of imbalance of all labels, the greater avgIR, the greater the imbalance of the dataset. Finally, rDep measures the proportion of pairs of labels that are dependent at 99% confidence. A broader description of all the characterization metrics and the used partition methods are described in

4 papers0 benchmarksAudio, Biology, Images, Medical, Music, Texts, Videos

Polaris (Polaris dataset)

The Polaris dataset offers a large-scale, diverse benchmark for evaluating metrics for image captioning, surpassing existing datasets in terms of size, caption diversity, number of human judgments, and granularity of the evaluations. It includes 131,020 generated captions and 262,040 reference captions. The generated captions have a vocabulary of 3,154 unique words and the reference captions have a vocabulary of 22,275 unique words.

4 papers0 benchmarksImages, Texts

LLM-Seg40K

LLM-Seg40K dataset contains 14K images in total. The dataset is divided into training, validation, and test sets, containing 11K, 1K, and 2K images respectively. For the training split, each image has 3.95 questions on average and the average question question length is 15.2 words. The training set contains 1458 different categories in total.

4 papers0 benchmarksImages, Texts

OpenViVQA (Open-domain Visual Question Answering in Vietnamese)

In recent years, visual question answering (VQA) has attracted attention from the research community because of its highly potential applications (such as virtual assistance on intelligent cars, assistant devices for blind people, or information retrieval from document images using natural language as queries) and challenge. The VQA task requires methods that have the ability to fuse the information from questions and images to produce appropriate answers. Neural visual question answering models have achieved tremendous growth on large-scale datasets which are mostly for resource-rich languages such as English. However, available datasets narrow the VQA task as the answers selection task or answer classification task. We argue that this form of VQA is far from human ability and eliminates the challenge of the answering aspect in the VQA task by just selecting answers rather than generating them. In this paper, we introduce the OpenViVQA (Open-domain Vietnamese Visual Question Answering

4 papers0 benchmarksImages, Texts

EVJVQA (English-Japanese-Vietnamese Visual Question Answering)

EVJVQA, the first multilingual Visual Question Answering dataset with three languages: English, Vietnamese, and Japanese, is released in this task. UIT-EVJVQA includes question-answer pairs created by humans on a set of images taken in Vietnam, with the answer created from the input question and the corresponding image. EVJVQA consists of 33,000+ question-answer pairs for evaluating the mQA models.

4 papers0 benchmarksImages, Texts

Khanhha's dataset

From my knowledge, the dataset used in the project is the largest crack segmentation dataset so far. It contains around 11.200 images that are merged from 12 available crack segmentation datasets.

4 papers0 benchmarksImages

MMToM-QA (Multimodal Theory of Mind Question Answering)

MMToM-QA is the first multimodal benchmark to evaluate machine Theory of Mind (ToM), the ability to understand people's minds. MMToM-QA consists of 600 questions. Each question is paired with a clip of the full activity in a video (as RGB-D frames), as well as a text description of the scene and the actions taken by the person in that clip. All questions have two choices. The questions are categorized into seven types, evaluating belief inference and goal inference in rich and diverse situations. Each belief inference type has 100 questions, totaling 300 belief questions; each goal inference type has 75 questions, totaling 300 goal questions. The questions are paired with 134 videos of a person looking for daily objects in household environments.

4 papers0 benchmarksImages, RGB Video, RGB-D, Texts, Videos

PARIS Dataset (PARIS Two-Part Object Dataset)

From PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects: 5.1. Dataset Synthetic dataset. The synthetic 3D models we use for evaluation are from the PartNet-Mobility dataset [49, 27, 4], a large-scale dataset for articulated objects across 46 categories. We select instances across 10 categories to conduct our experiments. For each articulation state, we randomly sample 64-100 views covering the upper hemisphere of the object to simulate capturing in the real world. Then we render RGB images and acquire camera parameters and object masks using Blender [6] to create our training data. Real-world dataset. The real data we use for experiments is from the MultiScan dataset [25], scanning real-world indoor scenes with articulated objects in multiple states. We use the reconstructed mesh of an object in two states as ground truth for evaluation, and the real RGB frames as training data.

4 papers0 benchmarks3D, Images, RGB-D

USIS10K (Large-scale Underwater Salient Instance Segmentation Dataset)

We construct the first large-scale dataset, USIS10K, for the underwater salient instance segmentation task, which contains 10,632 images and pixel-level annotations of 7 categories. As far as we know, this is the largest salient instance segmentation dataset, and includes Class-Agnostic and Multi-Class labels simultaneously.

4 papers0 benchmarksImages

VLKEB

A Large Vision-Language Model Knowledge Editing Benchmark

4 papers0 benchmarksImages, Texts

BIOSCAN-5M

As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, we present the BIOSCAN-5M Insect dataset to the machine learning community. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by including taxonomic labels, raw nucleotide barcode sequences, assigned barcode index numbers, geographical information, and specimen size.

4 papers0 benchmarksBiology, Images

MUSES: MUlti-SEnsor Semantic perception dataset (The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty)

MUSES offers 2500 multi-modal scenes, evenly distributed across various combinations of weather conditions (clear, fog, rain, and snow) and types of illumination (daytime, nighttime). Each image includes high-quality 2D pixel-level panoptic annotations and class-level and novel instance-level uncertainty annotations. Further, each adverse-condition image has a corresponding image of the same scene taken under clear-weather, daytime conditions. The annotation process for MUSES utilizes all available sensor data, allowing the annotators to also reliably label degraded image regions that are still discernible in other modalities. This results in better pixel coverage in the annotations and creates a more challenging evaluation setup.

4 papers15 benchmarksImages, LiDAR, Point cloud, RGB-D

LayoutBench-COCO - Number

LayoutBench-COCO is a diagnostic benchmark that examines layout-guided image generation models on arbitrary, unseen layouts. Unlike LayoutBench, LayoutBench-COCO consists of OOD layouts of real objects and suports zero-shot evaluation. LayoutBench-COCO measures 4 skills (Number, Position, Size, Combination), whose objects are from MS COCO. The new 'combination’ split consists of layouts with two objects in different spatial relations, and the remaining three splits are similar to those of LayoutBench. Download dataset at: https://huggingface.co/datasets/j-min/layoutbench-coco

4 papers1 benchmarksImages, Texts

LayoutBench-COCO - Position

LayoutBench-COCO is a diagnostic benchmark that examines layout-guided image generation models on arbitrary, unseen layouts. Unlike LayoutBench, LayoutBench-COCO consists of OOD layouts of real objects and suports zero-shot evaluation. LayoutBench-COCO measures 4 skills (Number, Position, Size, Combination), whose objects are from MS COCO. The new 'combination’ split consists of layouts with two objects in different spatial relations, and the remaining three splits are similar to those of LayoutBench. Download dataset at: https://huggingface.co/datasets/j-min/layoutbench-coco

4 papers1 benchmarksImages, Texts

LayoutBench-COCO - Size

LayoutBench-COCO is a diagnostic benchmark that examines layout-guided image generation models on arbitrary, unseen layouts. Unlike LayoutBench, LayoutBench-COCO consists of OOD layouts of real objects and suports zero-shot evaluation. LayoutBench-COCO measures 4 skills (Number, Position, Size, Combination), whose objects are from MS COCO. The new 'combination’ split consists of layouts with two objects in different spatial relations, and the remaining three splits are similar to those of LayoutBench. Download dataset at: https://huggingface.co/datasets/j-min/layoutbench-coco

4 papers1 benchmarksImages, Texts

LayoutBench-COCO - Combination

LayoutBench-COCO is a diagnostic benchmark that examines layout-guided image generation models on arbitrary, unseen layouts. Unlike LayoutBench, LayoutBench-COCO consists of OOD layouts of real objects and suports zero-shot evaluation. LayoutBench-COCO measures 4 skills (Number, Position, Size, Combination), whose objects are from MS COCO. The new 'combination’ split consists of layouts with two objects in different spatial relations, and the remaining three splits are similar to those of LayoutBench. Download dataset at: https://huggingface.co/datasets/j-min/layoutbench-coco

4 papers1 benchmarksImages, Texts

CASIA (OSN-transmitted - Facebook)

This dataset is an OSN-transmitted (OSN = Online Social Network) version of the CASIA dataset. The dataset is available here: https://github.com/HighwayWu/ImageForensicsOSN - more specifically: https://drive.google.com/file/d/1uMNZdhX3bYAZNcVGlkCvrnj5lSLW1ld5/view?usp=sharing and was presented in:

4 papers27 benchmarksImages

Datasets

ImageNet-1k vs NINCO (No ImageNet Class Objects)

Kitsune Network Attack Dataset

CHOCOLATE (Captions Have Often ChOsen Lies About The Evidence)

Multi-Label Classification Dataset Repository

Polaris (Polaris dataset)

LLM-Seg40K

OpenViVQA (Open-domain Visual Question Answering in Vietnamese)

EVJVQA (English-Japanese-Vietnamese Visual Question Answering)

Khanhha's dataset

MMToM-QA (Multimodal Theory of Mind Question Answering)

PARIS Dataset (PARIS Two-Part Object Dataset)

USIS10K (Large-scale Underwater Salient Instance Segmentation Dataset)

VLKEB

BIOSCAN-5M

MUSES: MUlti-SEnsor Semantic perception dataset (The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty)

LayoutBench-COCO - Number

LayoutBench-COCO - Position

LayoutBench-COCO - Size

LayoutBench-COCO - Combination

CASIA (OSN-transmitted - Facebook)

Datasets

ImageNet-1k vs NINCO (No ImageNet Class Objects)

Kitsune Network Attack Dataset

CHOCOLATE (Captions Have Often ChOsen Lies About The Evidence)

Multi-Label Classification Dataset Repository

Polaris (Polaris dataset)

LLM-Seg40K

OpenViVQA (Open-domain Visual Question Answering in Vietnamese)

EVJVQA (English-Japanese-Vietnamese Visual Question Answering)

Khanhha's dataset

MMToM-QA (Multimodal Theory of Mind Question Answering)

PARIS Dataset (PARIS Two-Part Object Dataset)

USIS10K (Large-scale Underwater Salient Instance Segmentation Dataset)

VLKEB

BIOSCAN-5M

MUSES: MUlti-SEnsor Semantic perception dataset (The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty)

LayoutBench-COCO - Number

LayoutBench-COCO - Position

LayoutBench-COCO - Size

LayoutBench-COCO - Combination

CASIA (OSN-transmitted - Facebook)