Datasets

19,997 machine learning datasets

19,997 dataset results

MATH-V

Math-Vision (Math-V) dataset is a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs.

12 papers1 benchmarksImages, Texts

TYO-L

The TYO-L (Toyota Light) dataset is part of the Benchmark for 6D Object Pose Estimation (BOP). Let's delve into the details:

12 papers0 benchmarks

EgoExoLearn

EgoExoLearn is a fascinating dataset designed to bridge the gap between egocentric and exocentric views of procedural activities.

12 papers8 benchmarks

MMNeedle (Multimodal Needle in a Haystack)

We introduce the MultiModal Needle-in-a-haystack (MMNeedle) benchmark, specifically designed to assess the long-context capabilities of MLLMs. Besides multi-image input, we employ image stitching to further increase the input context length, and develop a protocol to automatically generate labels for sub-image level retrieval. Essentially, MMNeedle evaluates MLLMs by stress-testing their capability to locate a target sub-image (needle) within a set of images (haystack) based on textual instructions and descriptions of image contents. This setup necessitates an advanced understanding of extensive visual contexts and effective information retrieval within long-context image inputs.

12 papers7 benchmarksImages, Texts

AE-110k (AliExpress - 110k)

The dataset contains product information from AliExpress Sports & Entertainment category. Each attribute value in "Item Specific" is matched against the product title using exact string match to generate positive triples <title, attribute, value>. Negative triples <title, attribute, NULL> are randomly generated. Each triple is stored in a line and separated by \u0001.

12 papers3 benchmarksTexts

Artaxor

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

12 papers6 benchmarks

UODD

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

12 papers6 benchmarks

LLaMEA (algorithms and experiments from the paper)

3500+ Generated evolutionary algorithms by the LLaMEA framework. Experiment files with Python code.

12 papers0 benchmarksTexts

DBLP (Heterogeneous Node Classification)

A popular dataset for node classification on heterogeneous graphs.

12 papers3 benchmarks

ACM (Heterogeneous Node Classification)

A popular dataset for node classification on heterogeneous graphs.

12 papers4 benchmarks

Synthia-Seq

Synthia-Seq contains 8,000 photo-realistic frames with dense segmentation labels. We often employ video frames in the RGB folder and choose 11 categories common with Cityscapes-Seq for domain adaptation.

12 papers0 benchmarks

CityStreet

Datasets for multi-view crowd counting in wide-area scenes. Includes our CityStreet dataset, as well as the counting and metadata for multi-view counting on PETS2009 and DukeMTMC. CityStreet is a real-world city scene dataset collected around the intersection of a crowded street. The scene size of the dataset is around 58m×72m. The ground plane map resolution is 320×384.

12 papers30 benchmarks

WildScenes

WildScenes is a bi-modal benchmark dataset consisting of multiple large-scale, sequential traversals in natural environments, including semantic annotations in high-resolution 2D images and dense 3D LiDAR point clouds, and accurate 6-DoF pose information. The data is (1) trajectory-centric with accurate localization and globally aligned point clouds, (2) calibrated and synchronized to support bi-modal training and inference, and (3) containing different natural environments over 6 months to support research on domain adaptation. We introduce benchmarks on 2D and 3D semantic segmentation and evaluate a variety of recent deep-learning techniques to demonstrate the challenges in semantic segmentation in natural environments. We propose train-val-test splits for standard benchmarks as well as domain adaptation benchmarks and utilize an automated split generation technique to ensure the balance of class label distributions. The WildScenes benchmark webpage is https://csiro-robotics.github.i

12 papers12 benchmarksImages, LiDAR, Point cloud

Neptune (Neptune Long Video Understanding Benchmark)

Neptune is a dataset consisting of challenging question-answer-decoy (QAD) sets for long videos (up to 15 minutes). The goal of this dataset is to test video-language models for a broad range of long video reasoning abilities, which are provided as "question type" labels for each question, for example "video summarization", "temporal ordering", "state changes" and "creator intent" amongst others.

12 papers0 benchmarksAudio, Texts, Videos

LIBERO-10

100 tasks from LIBERO-100 suite. Note that the datasets are split under the folder names of LIBERO-90 and LIBERO-10. The 10 contains selected task that require long-horizon task completion.

12 papers0 benchmarksActions, Images, Texts

PeerQA

We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset. PeerQA questions have been sourced from peer reviews, which contain questions that reviewers raised while thoroughly examining the scientific article. Answers have been annotated by the original authors of each paper. The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP, as well as a subset of other scientific communities like Geoscience and Public Health. PeerQA supports three critical tasks for developing practical QA systems: Evidence retrieval, unanswerable question classification, and answer generation. We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks. Our experiments and analyses reveal the need for decontextualization in document-level retrieval, where we find that even simple decontextualization approaches consistently improve retrieval performance across architectur

12 papers6 benchmarksTexts

SILK (Synth It Like KITTI)

An important factor in advancing autonomous driving systems is simulation. Yet, there is rather small progress for transferability between the virtual and real world. We revisit this problem for 3D object detection on LiDAR point clouds and propose a dataset generation pipeline based on the CARLA simulator. Utilizing domain randomization strategies and careful modeling, we are able to train an object detector on the synthetic data and demonstrate strong generalization capabilities to the KITTI dataset.

12 papers0 benchmarks3D, Images

MOW (3D dataset of humans Manipulating Objects in-the-Wild)

3D dataset of humans Manipulating Objects in-the-Wild (MOW). The dataset contains 512 images in the wild, spanning 121 object categories with annotation of instance category, 3D object models, 3D hand pose, and object pose annotation.

12 papers3 benchmarks

GTOS-Mobile

81 videos of 31 classes of ground terrain such as grass, gravel, asphalt and sand.

12 papers0 benchmarks

Ohsumed

Ohsumed includes medical abstracts from the MeSH categories of the year 1991. In [Joachims, 1997] were used the first 20,000 documents divided in 10,000 for training and 10,000 for testing. The specific task was to categorize the 23 cardiovascular diseases categories. After selecting the such category subset, the unique abstract number becomes 13,929 (6,286 for training and 7,643 for testing). As current computers can easily manage larger number of documents we make available all 34,389 cardiovascular diseases abstracts out of 50,216 medical abstracts contained in the year 1991.

11 papers3 benchmarksTexts

PreviousPage 142 of 1000Next