Datasets

3,275 machine learning datasets

3,275 dataset results

ImageNet-1k vs SUN

A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while SUN is out-of-distribution.

19 papers2 benchmarksImages

InfiMM-Eval (Complex Open-ended Reasoning Evaluation for Multi-Modal Language Models)

Multi-modal Large Language Models (MLLMs) are increasingly prominent in the field of artificial intelligence. Although many benchmarks attempt to holistically evaluate MLLMs, they typically concentrate on basic reasoning tasks, often yielding only simple yes/no or multi-choice responses. These methods naturally lead to confusion and difficulties in conclusively determining the reasoning capabilities of MLLMs. To mitigate this issue, we manually curate CORE-MM benchmark dataset, specifically designed for MLLMs with a focus on complex reasoning tasks. Our benchmark comprises three key reasoning categories: deductive, abductive, and analogical reasoning. The queries in our dataset are intentionally constructed to engage the reasoning capabilities of MLLMs in the process of generating answers. For a fair comparison across various MLLMs, we incorporate intermediate reasoning steps into our evaluation criteria. CORE-MM benchmark consists of 279 manually curated reasoning questions, associate

19 papers5 benchmarksImages

CASIA-B

CASIA-B is a large multiview gait database, which is created in January 2005. There are 124 subjects, and the gait data was captured from 11 views. Three variations, namely view angle, clothing and carrying condition changes, are separately considered. Besides the video files, we still provide human silhouettes extracted from video files. The detailed information about Dataset B and an evaluation framework can be found in this paper .

18 papers5 benchmarksImages, Videos

Amazon Beauty (Amazon Beauty 5-core)

This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

18 papers8 benchmarksImages, Texts

Kennedy Space Center

Kennedy Space Center is a dataset for the classification of wetland vegetation at the Kennedy Space Center, Florida using hyperspectral imagery. Hyperspectral data were acquired over KSC on March 23, 1996 using JPL's Airborne Visible/Infrared Imaging Spectrometer.

18 papers18 benchmarksHyperspectral images, Images

Florence3D

The dataset collected at the University of Florence during 2012, has been captured using a Kinect camera. It includes 9 activities: wave, drink from a bottle, answer phone,clap, tight lace, sit down, stand up, read watch, bow. During acquisition, 10 subjects were asked to perform the above actions for 2/3 times. This resulted in a total of 215 activity samples.

18 papers0 benchmarks3D, Images

ICDAR 2017

ICDAR2017 is a dataset for scene text detection.

18 papers0 benchmarksImages, Texts

Replay-Mobile

The Replay-Mobile Database for face spoofing consists of 1190 video clips of photo and video attack attempts to 40 clients, under different lighting conditions. These videos were recorded with current devices from the market -- an iPad Mini2 (running iOS) and a LG-G4 smartphone (running Android). This Database was produced at the Idiap Research Institute (Switzerland) within the framework of collaboration with Galician Research and Development Center in Advanced Telecommunications - Gradiant (Spain).

18 papers0 benchmarksImages, Videos

VAST (VAried Stance Topics)

VAST consists of a large range of topics covering broad themes, such as politics (e.g., ‘a Palestinian state’), education (e.g., ‘charter schools’), and public health (e.g., ‘childhood vaccination’). In addition, the data includes a wide range of similar expressions (e.g., ‘guns on campus’ versus ‘firearms on campus’). This variation captures how humans might realistically describe the same topic and contrasts with the lack of variation in existing datasets.

18 papers1 benchmarksImages, Texts

MMD (Multimodal Dialogs)

The MMD (MultiModal Dialogs) dataset is a dataset for multimodal domain-aware conversations. It consists of over 150K conversation sessions between shoppers and sales agents, annotated by a group of in-house annotators using a semi-automated manually intense iterative process.

18 papers0 benchmarksImages, Texts

Violin (VIdeO-and-Language INference)

Video-and-Language Inference is the task of joint multimodal understanding of video and text. Given a video clip with aligned subtitles as premise, paired with a natural language hypothesis based on the video content, a model needs to infer whether the hypothesis is entailed or contradicted by the given video clip. The Violin dataset is a dataset for this task which consists of 95,322 video-hypothesis pairs from 15,887 video clips, spanning over 582 hours of video. These video clips contain rich content with diverse temporal dynamics, event shifts, and people interactions, collected from two sources: (i) popular TV shows, and (ii) movie clips from YouTube channels.

18 papers0 benchmarksImages, Texts

CLEVR-Humans

We collect a new dataset of human-posed free-form natural language questions about CLEVR images. Many of these questions have out-of-vocabulary words and require reasoning skills that are absent from our model’s repertoire

18 papers1 benchmarksImages, Texts

LIVE (Public-Domain Subjective Image Quality Database)

The LIVE Public-Domain Subjective Image Quality Database is a resource developed by the Laboratory for Image and Video Engineering at the University of Texas at Austin. It contains a set of images and videos whose quality has been ranked by human subjects. This database is used in Quality Assessment (QA) research, which aims to make quality predictions that align with the subjective opinions of human observers.

18 papers0 benchmarksImages

PASTIS (Panoptic Segmentation of satellite image TImes Series)

PASTIS is a benchmark dataset for panoptic and semantic segmentation of agricultural parcels from satellite image time series. It is composed of 2433 one square kilometer-patches in the French metropolitan territory for which sequences of satellite observations are assembled into a four-dimensional spatio-temporal tensor. The dataset contains both semantic and instance annotations, assigning to each pixel a semantic label and an instance id. There is an official 5 fold split provided in the dataset's metadata.

18 papers15 benchmarksImages

TUM-VIE (TUM Stereo Visual-Inertial Event Dataset)

TUM-VIE is an event camera dataset for developing 3D perception and navigation algorithms. It contains handheld and head-mounted sequences in indoor and outdoor environments with rapid motion during sports and high dynamic range. TUM-VIE includes challenging sequences where state-of-the art VIO fails or results in large drift. Hence, it can help to push the boundary on event-based visual-inertial algorithms.

18 papers0 benchmarks3D, Images

WTW (Wired Table in the Wild)

WTW (Wired Table in the Wild) is a large-scale dataset which includes well-annotated structure parsing of multiple style tables in several scenes like the photo, scanning files, web pages.

18 papers1 benchmarksImages

LIVECell (Label-free In Vitro image Examples of Cells)

The LIVECell (Label-free In Vitro image Examples of Cells) dataset is a large-scale microscopic image dataset for instance-segmentation of individual cells in 2D cell cultures.

18 papers10 benchmarksBiology, Biomedical, Images

SCICAP

SCICAP is a large-scale image captioning dataset that contains real-world scientific figures and captions. SCICAP was constructed using more than two million images from over 290,000 papers collected and released by arXiv.

18 papers1 benchmarksImages, Texts

AI-TOD (Tiny Object Detection in Aerial Images)

AI-TOD comes with 700,621 object instances for eight categories across 28,036 aerial images. Compared to existing object detection datasets in aerial images, the mean size of objects in AI-TOD is about 12.8 pixels, which is much smaller than others.

18 papers47 benchmarksImages

ZInd (Zillow Indoor Dataset)

The Zillow Indoor Dataset (ZInD) provides extensive visual data that covers a real world distribution of unfurnished residential homes. It consists of primary 360º panoramas with annotated room layouts, windows, doors and openings (W/D/O), merged rooms, secondary localized panoramas, and final 2D floor plans. The figure above illustrates the various representations (from left to right beyond capture): Room layout with W/D/O annotations, merged layouts, 3D textured mesh, and final 2D floor plan.

18 papers8 benchmarks3D, Images

PreviousPage 39 of 164Next