Datasets

3,275 machine learning datasets

3,275 dataset results

FFHQH (Flickr-Faces-HQ-Harmonization)

A new dataset for portrait harmonization based on the FFHQ. It contains real images, foreground masks, and synthesized composites.

1 papers2 benchmarksImages

AlgoPuzzleVQA

We introduce the novel task of multimodal puzzle solving, framed within the context of visual question-answering. We present a new dataset, AlgoPuzzleVQA designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles that necessitate both visual understanding, language understanding, and complex algorithmic reasoning. We create the puzzles to encompass a diverse array of mathematical and algorithmic topics such as boolean logic, combinatorics, graph theory, optimization, search, etc., aiming to evaluate the gap between visual data interpretation and algorithmic problem-solving skills. The dataset is generated automatically from code authored by humans. All our puzzles have exact solutions that can be found from the algorithm without tedious human calculations. It ensures that our dataset can be scaled up arbitrarily in terms of reasoning complexity and dataset size. Our investigation reveals that large language models (LLMs) such as GPT

1 papers1 benchmarksImages, Texts

HistGen WSI-Report Dataset

This dataset is composed of 7,753 pairs of whole slide images and their corresponding diagnostic reports, extracted from the TCGA platform and refined with large language models. This dataset aims to boost the field of automated histopathology report generation by providing a new publicly available evaluation benchmark. See HistGen paper (see https://arxiv.org/pdf/2403.05396.pdf for reference) for a more detailed description of this dataset.

1 papers1 benchmarksImages, Texts

Omnicount-191

To effectively evaluate OmniCount across open-vocabulary, supervised, and few-shot counting tasks, a dataset catering to a broad spectrum of visual categories and instances featuring various visual categories with multiple instances and classes per image is essential. The current datasets, primarily designed for object counting focusing on singular object categories like humans and vehicles, fall short for multi-label object counting tasks. Despite the presence of multi-class datasets like MS COCO, their utility is limited for counting due to the sparse nature of object appearance. Addressing this gap, we created a new dataset with 30,230 images spanning 191 diverse categories, including kitchen utensils, office supplies, vehicles, and animals. This dataset, featuring a wide range of object instance counts per image ranging from 1 to 160 and an average count of 10, bridges the existing void and establishes a benchmark for assessing counting models in varied scenarios.

1 papers1 benchmarksImages, Texts

ThermoScenes

Dataset of paired thermal and RGB images comprising ten diverse scenes—six indoor and four outdoor scenes— for 3D scene reconstruction and novel view synthesis (e.g. with NeRF).

1 papers0 benchmarks3D, Hyperspectral images, Images

CFC-DAOD (Caltech Fish Counting – Domain Adaptive Object Detection)

CFC-DAOD is a domain adaptation extension to the Caltech Fish Counting domain generalization benchmark.

1 papers2 benchmarksImages

idsprites (Infinite dSprites)

Easily generate simple continual learning benchmarks. Inspired by dSprites.

1 papers0 benchmarksImages

BioDrone

BioDrone is the first bionic drone-based single object tracking benchmark, it features videos captured from a flapping-wing UAV system with a major camera shake due to its aerodynamics. BioDrone highlights the tracking of tiny targets with drastic changes between consecutive frames, providing a new robust vision benchmark for SOT. 1. Large-scale and high-quality benchmark with robust vision challenges 2. Rich challenging factor annotation 3. Videos from Bionic-based UAV 4. Tracking baselines with comprehensive experimental analyses

1 papers0 benchmarksImages, Videos

SOTVerse

SOTVerse is a user-defined task space of single object tracking. It allows users to customize SOT tasks according to their research purposes, which on the one hand makes research more targeted, and on the other hand can significantly improve the efficiency of research.

1 papers0 benchmarksImages, Videos

DIGITal (Digitally Generated Numerals)

Digitally Generated Numerals (DIGITal) Description The Digitally Generated Numerals (DIGITal) dataset consists of 100,000 image pairs representing digits from 0 to 9. These image pairs include both low and high-quality versions, with a resolution of 128x128 pixels.

1 papers0 benchmarksImages

MatSeg (Dataset for Zero-Shot Material States Segmentation)

MatSeg Dataset for Zero-Shot Material States Segmentation: The dataset contains large-scale synthetic images for training data and highly diverse real-world image benchmarks for testing. Focusing on zero-shot class-agnostic segmentation of materials and their states. This means finding the region of materials states without pre-training on the specific material classes or states. The benchmark contains a wide range of real-world materials and states. For example: wet regions of the surface, scattered dust, minerals of rocks, the sediment of soils, rotten parts of fruits, degraded and corrosive surface regions, food and liquid states, and many others. The focus is on scattered and fragmented materials, as well as soft boundaries partial transition, and partial similarity between regions. It contains both hard segmentation maps and soft and partial similarity annotations for similar but not identical materials.

1 papers0 benchmarksImages

ConQA (Conceptual Query Answering)

ConQA is a dataset created using the intersection between VisualGenome and MS-COCO. The goal of this dataset is to provide a new benchmark for text to image retrieval using short and less descriptive queries than the commonly use captions from MS-COCO or Flicker. ConQA consists of 80 queries divided into 50 conceptual and 30 descriptive queries. A descriptive query mentions some of the objects in the image, for instance, people chopping vegetables. While, a conceptual query does not mention objects or only refers to objects in a general context, e.g., working class life.

1 papers0 benchmarksImages, Texts

HRPlanesV2 (HRPlanesv2 - High Resolution Satellite Imagery for Aircraft Detection)

The HRPlanesv2 dataset contains 2120 VHR Google Earth images. To further improve experiment results, images of airports from many different regions with various uses (civil/military/joint) selected and labeled. A total of 14,335 aircrafts have been labelled. Each image is stored as a ".jpg" file of size 4800 x 2703 pixels and each label is stored as YOLO ".txt" format. Dataset has been split in three parts as 70% train, %20 validation and test. The aircrafts in the images in the train and validation datasets have a percentage of 80 or more in size. Link: https://github.com/dilsadunsal/HRPlanesv2-Data-Set

1 papers0 benchmarksImages

GDIT

The GDIT Aerial Airport dataset consists of aerial images containing instances of parked airplanes. All plane types have been grouped into a single classification named "airplane".

1 papers0 benchmarksImages

Bramble flower image dataset (BRAMBLE FLOWER DETECTION AND CLASSIFICATION DATASET FOR PRECISION POLLINATION)

This dataset contains both the artificial and real flower images of bramble flowers. The real images were taken with a realsense D435 camera inside the West Virginia University greenhouse. All the flowers are annotated in YOLO format with bounding box and class name. The trained weights after training also have been provided. They can be used with the python script provided to detect the bramble flowers. Also the classifier can classify whether the flowers center is visible or hidden which will be helpful in precision pollination projects. Images are also augmented to make the task robust in various environmental conditions.

1 papers0 benchmarksImages

CAESAR-Radi (CAESAR-Radi: SAR-Ship-Dataset)

This dataset labeled by SAR experts was created using 102 Chinese Gaofen-3 images and 108 Sentinel-1 images. It consists of 39,729 ship chips(remove some repeat clips) of 256 pixels in both range and azimuth. These ships mainly have distinct scales and backgrounds. It can be used to develop object detectors for multi-scale and small object detection. The details of this dataset is referred to "Wang, Yuanyuan, Chao Wang, Hong Zhang, Yingbo Dong, and Sisi Wei. 2019. "A SAR Dataset of Ship Detection for Deep Learning under Complex Backgrounds." Remote Sensing 11 (7). doi: 10.3390/rs11070765."

1 papers0 benchmarksImages

COCO-OOC

COCO-OOC goes beyond standard object detection to ask the question: Which objects are out-of-context (OOC)? Given an image with a set of objects, the goal of COCO-OOC is to determine if an object is inconsistent with the contextual relations, where it must detect the OOC object with a bounding box.

1 papers1 benchmarksImages

V2AIX (A Multi-Modal Real-World Dataset of ETSI ITS V2X Messages in Public Road Traffic)

Connectivity is a main driver for the ongoing megatrend of automated mobility: future Cooperative Intelligent Transport Systems (C-ITS) will connect road vehicles, traffic signals, roadside infrastructure, and even vulnerable road users, sharing data and compute for safer, more efficient, and more comfortable mobility. In terms of communication technology for realizing such vehicle-to-everything (V2X) communication, the WLAN-based peer-to-peer approach (IEEE 802.11p, ITS-G5 in Europe) competes with C-V2X based on cellular technologies (4G and beyond). Irrespective of the underlying communication standard, common message interfaces are crucial for a common understanding between vehicles, especially from different manufacturers. Targeting this issue, the European Telecommunications Standards Institute (ETSI) has been standardizing V2X message formats such as the Cooperative Awareness Message (CAM). In this work, we present V2AIX, a multi-modal real-world dataset of ETSI ITS messages gath

1 papers0 benchmarksImages, LiDAR, Point cloud

HRI Simple Tasks

The dataset concerns toy tasks that a human should teach to a robot. The number of task repetitions is limited in the dataset since the human should demonstrate the task to the robot only a few times.

1 papers0 benchmarksImages, Tracking

No Background RGB Arabic Alphabets Sign Language Dataset

The AASL-Clear dataset is a collection of RGB images featuring Arabic alphabet sign Language gestures with backgrounds removed. Each image in this dataset showcases clear, isolated hand gestures, allowing for precise recognition and analysis of Arabic sign language alphabets. With transparent backgrounds, this dataset provides a clean and focused resource for training deep learning models in the domain of Arabic sign language recognition and classification.

1 papers1 benchmarksImages

PreviousPage 136 of 164Next