Datasets

19,997 machine learning datasets

19,997 dataset results

GoogleEarth

The GoogleEarth dataset is collected from Google Earth Studio, including 400 orbit trajectories in Manhattan and Brooklyn. Each trajectory consists of 60 images, with orbit radiuses ranging from 125 to 813 meters and altitudes varying from 112 to 884 meters. In addition to the images, Google Earth Studio provides camera intrinsic and extrinsic parameters, making it possible to create automated annotations for semantic and building instance segmentation

5 papers8 benchmarks

CC3M-TagMask

The dataset offers tag and mask annotations for image-text pairs from the CC3M validation set. Tag annotations denote words that aptly describe the relationship between the image and the corresponding text. These annotations provide valuable insights into the semantic connection between each pair's visual and textual elements.

5 papers17 benchmarksImages, Texts

MMFlood

MMFlood is remote sensing dataset derived from Sentinel-1 (VV-VH), MapZen (DEM) and OpenStreetMap (Hydrography). It provides a complete and well-rounded set of data specifically designed for flood events, focusing on three main features: worldwide distribution, manually validated annotations and multiple modalities.

5 papers1 benchmarksImages

AART (AI-Assisted Red-Teaming)

AART serves as an automated alternative to the current manual red-teaming efforts. The primary goal is to evaluate the safety of LLM generations in various application contexts.

5 papers0 benchmarks

ChronoMagic

ChronoMagic with 2265 metamorphic time-lapse videos, each accompanied by a detailed caption.

5 papers0 benchmarksTexts, Videos

FAUST-partial

FAUST-partial is a 3D registration benchmark dataset created to provide a more informative evaluation of 3D registration methods. The dataset addresses two main limitations of current 3D registration benchmarks:

5 papers0 benchmarks3D

FormNLU (Form-NLU: Dataset for the Form Language Understanding)

We introduce a new dataset for form structure understanding and key information extraction. This repository will provide detailed baseline model descriptions and experimental setups to ensure our model and experiments are reproducible. As well as we will offer a colab link to show how to download and use our dataset for corresponding tasks.

5 papers0 benchmarks

Aria Synthetic Environments

Aria Synthetic Environments is a large-scale, fully simulated dataset created by Project Aria¹1. It consists of procedurally-generated interior layouts filled with 3D objects, simulated with the sensor characteristics of Aria glasses¹1. Here are some key features of this dataset:

5 papers14 benchmarks

BEHAVIOR-1K

The BEHAVIOR-1K dataset is a comprehensive simulation benchmark for human-centered robotics¹. It is more grounded on actual human needs compared to its predecessor, BEHAVIOR-100¹. The 1,000 activities in the dataset come from the results of an extensive survey on "what do you want robots to do for you?"¹.

5 papers0 benchmarks

FineWeb

The FineWeb dataset consists of more than 15T tokens of cleaned and deduplicated English web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and runs on the datatrove library, our large-scale data processing library.

5 papers0 benchmarks

BTS3.1 (Expanding Accurate Person Recognition to New Altitudes and Ranges: The BRIAR Dataset)

Large, multimodal biometric dataset: It contains still images and videos of over 1,000 people captured at various ranges (up to 1,000 meters) and elevations (up to 400 meters) using a diverse set of cameras (commercial, military-grade, specialized).

5 papers7 benchmarksImages, Videos

25kTrees (Individual Tree Crown Annotations)

Manual crown delineation of individual trees in two countries: Denmark and Finland.

5 papers0 benchmarksImages

RTL-Repo

RTL-Repo is a benchmark for evaluating LLMs' effectiveness in generating Verilog code autocompletions within large, complex codebases. It assesses the model's ability to understand and remember the entire Verilog repository context and generate new code that is correct, relevant, logically consistent, and adherent to coding conventions and guidelines, while being aware of all components and modules in the project. This provides a realistic evaluation of a model's performance in real-world RTL design scenarios. RTL-Repo comprises over 4000 code samples from GitHub repositories, each containing the context of all Verilog code in the repository, offering a valuable resource for the hardware design community to assess and train LLMs for Verilog code generation in complex, multi-file RTL projects.

5 papers0 benchmarksTexts

SkyEye-968k

SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

5 papers0 benchmarksImages, Texts

OoDIS (Anomaly Instance Segmentation Benchmark)

OoDIS is a benchmark dataset for anomaly instance segmentation, crucial for autonomous vehicle safety. It extends existing anomaly segmentation benchmarks to focus on the segmentation of individual out-of-distribution (OOD) objects.

5 papers12 benchmarksImages

SugarCrepe++

The SUGARCREPE++ dataset evaluates the sensitivity of vision language models (VLMs) and unimodal language models (ULMs) to semantic and lexical alterations. Each sample in the SugarCrepe++ dataset consists of an image and a corresponding triplet of captions: a pair of semantically equivalent but lexically different positive captions and one hard negative caption. This poses a 3-way semantic (in)equivalence problem to the language models. The SUGARCREPE dataset consists of (only) one positive and one hard negative caption for each image. Relative to the negative caption, a single positive caption can either have low or high lexical overlap. The original SUGARCREPE only captures the high overlap case. To evaluate the sensitivity of encoded semantics to lexical alteration, we require an additional positive caption with a different lexical composition. SUGARCREPE++ fills this gap by adding an additional positive caption enabling a more thorough assessment of models’ abilities to handle se

5 papers0 benchmarksImages, Texts

OlympicArena

OlympicArena is a benchmark to evaluate advanced capabilities of language models across a broad spectrum of Olympic-level challenges.

5 papers0 benchmarks

MoA (MoA_Long_ModelQA)

This is the dataset used by the automatic sparse attention compression method MoA. It enhances the calibration dataset by integrating long-range dependencies and model alignment. MoA utilizes long-contextual datasets, which include question-answer pairs heavily dependent on long-range content.

5 papers0 benchmarksTexts

NaturalCodeBench

NaturalCodeBench (NCB) is a comprehensive code benchmark designed to mirror the complexity and variety of scenarios in real coding tasks¹². It comprises 402 high-quality problems in Python and Java, meticulously selected from an online coding service, covering 6 different domains¹².

5 papers0 benchmarks

WiGesture (Wireless Sensing Dataset for Gesture Recognition and People ID Identification with ESP32)

WiGesture dataset contains data related to gesture recognition and people id identification in a meeting room scenario. The dataset provides synchronised CSI, RSSI, and timestamp for each sample. It can be used for research on WiFi-based human gesture recognition and people id identification.

5 papers2 benchmarksTime series

PreviousPage 227 of 1000Next