Datasets

19,997 machine learning datasets

19,997 dataset results

Building3D

Building3D is an urban-scale dataset consisting of more than 160 thousands buildings along with corresponding point clouds, mesh and wireframe models, covering 16 cities in Estonia about 998 Km2. Besides mesh models and real-world LiDAR point clouds, it also includes wireframe models.

8 papers0 benchmarks3d meshes, Point cloud

SciGraphQA

SciGraphQA is a large-scale, open-domain dataset focused on generating multi-turn conversational question-answering dialogues centered around understanding and describing scientific graphs and figures. It contains over 300,000 samples derived from academic research papers in computer science and machine learning domains.

8 papers0 benchmarksImages, Texts

SI-HDR (Single-image high dynamic range dataset)

The dataset consists of 181 HDR images. Each image includes: 1) a RAW exposure stack, 2) an HDR image, 3) simulated camera images at two different exposures 4) Results of 6 single-image HDR reconstruction methods: Endo et al. 2017, Eilertsen et al. 2017, Marnerides et al. 2018, Lee et al. 2018, Liu et al. 2020, and Santos et al. 2020

8 papers0 benchmarksImages

UDED (Unified Dataset for Edge Detection)

This dataset is a collection of 1, 2, or 3 images from: BIPED, BSDS500, BSDS300, DIV2K, WIRE-FRAME, CID, CITYSCAPES, ADE20K, MDBD, NYUD, THANGKA, PASCAL-Context, SET14, URBAN10, and the camera-man image. The image selection process consists on computing the Inter-Quartile Range (IQR) intensity value on all the images, images larger than 720×720 pixels were not considered. In dataset whose images are in HR, they were cut. We thank all the datasets owners to make them public. This dataset is just for Edge Detection not contour nor Boundary tasks.

8 papers2 benchmarks

VIST-E (VIST-Ending)

VIST-E consists of 49,913 training samples, 4,963 validation samples and 5,030 test samples, which is modified from VIST dataset. As every sample in VIST contains a story of five sentences, each sample in VIST-E contains the story ending, the ending-related image and the first four sentences in the story as the story context. Additionally, each sentence is trimmed down to a maximum of 40 words.

8 papers28 benchmarks

CS-Campus3D (CrossSource-Campus3D)

We present CS-Campus3D, the first 3D aerial-ground cross-source dataset consisting of point cloud data from both aerial and ground LiDAR scans. The point clouds in CS-Campus3D have representation gaps and other features like different views, point densities, and noise pattern.

8 papers4 benchmarks

T$^3$Bench

T$^3$Bench is the first comprehensive text-to-3D benchmark containing diverse text prompts of three increasing complexity levels that are specially designed for 3D generation (300 prompts in total). To assess both the subjective quality and the text alignment, we propose two automatic metrics based on multi-view images produced by the 3D contents. The quality metric combines multi-view text-image scores and regional convolution to detect quality and view inconsistency. The alignment metric uses multi-view captioning and Large Language Model (LLM) evaluation to measure text-3D consistency.

8 papers3 benchmarks3D, Texts

CommercialAdsDataset

A large commercial Ads Dataset includes 480K labeled query-ad pairwise data with structured information of image, title, seller, description, and so on.

8 papers3 benchmarks

ETTh1 (192) (Electricity Transformer Temperature)

The Electricity Transformer Temperature (ETT) is a crucial indicator in the electric power long-term deployment. This dataset consists of 2 years data from two separated counties in China. To explore the granularity on the Long sequence time-series forecasting (LSTF) problem, different subsets are created, {ETTh1, ETTh2} for 1-hour-level and ETTm1 for 15-minutes-level. Each data point consists of the target value ”oil temperature” and 6 power load features. The train/val/test is 12/4/4 months.

8 papers3 benchmarks

WANDS (Wayfair ANnotation Dataset)

The dataset contains:

8 papers0 benchmarksTexts

X3D

X3D is a dataset containing 15 scenes and covering 4 applications for X-ray 3D reconstruction. More specifically, the X3D dataset includes the scenes of (1) medicine: jaw, leg, chest, foot, abdomen, aneurism, pelvis, pancreas, head (2) biology: carp, bonsai (3) security: box, backpack (4) industry: engine, teapot

8 papers4 benchmarks

LongBench

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

8 papers1 benchmarks

PosterLayout

PKU PosterLayout, which consists of 9,974 poster-layout pairs and 905 images, i.e., non-empty canvases. Each layout is represented by a set of elements labeled with class and bounding box. We collect data from multiple sources to guarantee diversity and variety in content, domain, and layout, supporting it as a challenging benchmark expected to encourage further research. Besides the dataset, we propose and clearly define new metrics to accompany the old ones, a total of eight graphic and content-aware metrics. They evaluate the layouts in terms of utilization, non-occlusion, and aesthetics. Both quantitative results and visualized results show that the proposed approach outperforms other approaches by generating proper layouts on diverse canvases.

8 papers0 benchmarks

GQA-REX

A GQA-based dataset with 1,040,830 multi-modal explanations of visual reasoning processes.

8 papers24 benchmarksImages, Texts

GLUE-X

GLUE-X is a benchmark dataset used to evaluate the out-of-distribution (OOD) robustness of Natural Language Understanding (NLU) models. It was created to address the OOD generalization problem, which remains a challenge in many NLP tasks and limits the real-world deployment of these methods. The GLUE-X dataset consists of 14 publicly available datasets used as OOD test data. Evaluations are conducted on 8 classic NLP tasks over popularly used models. The findings from these evaluations highlight the need for improved OOD accuracy in NLP tasks, as significant performance degradation was observed in all settings compared to in-distribution (ID) accuracy. The creators of GLUE-X hope that this dataset will help highlight the importance of OOD robustness and provide insights on how to measure the robustness of a model and how to improve it.

8 papers0 benchmarks

PSC (Polish Summaries Corpus)

The Polish Summaries Corpus is a resource created to support the development and evaluation of tools for automated single-document summarization of Polish. The Corpus contains a large number of manual summaries of news articles, with many independently created summaries for a single text. This approach is designed to overcome the annotator bias, which is often described as a problem during the evaluation of summarization algorithms against a single gold standard.

8 papers0 benchmarks

NorNE

NorNE is a manually annotated corpus of named entities which extends the annotation of the existing Norwegian Dependency Treebank. Comprising both of the official standards of written Norwegian (Bokmål and Nynorsk), the corpus contains around 600,000 tokens and annotates a rich set of entity types including persons, organizations, locations, geo-political entities, products, and events, in addition to a class corresponding to nominals derived from names.

8 papers0 benchmarks

CHEAT (CHatGPT-writtEn AbsTracts)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

8 papers0 benchmarks

RGBE-SEG

To perform universal event stream segmentation, we collected a large-scale RGB-Event dataset for event-centric segmentation, from current available pixel-level aligned datasets (VisEvent, COESOT), namely RGBE-SEG. The RGBE-SEG included 65,957 image-event pairs, 64,957 for training and 1,000 for testing. The test set contained 38,760 masks, and we artificially divided it into easy, medium, and hard subsets based on the complexity of scenarios. All ground truth masks were generated by images and the well-trained SAM.

8 papers1 benchmarks

MVSEC-SEG

Based on the MVSEC dataset, we select some image-event pairs to evaluate the segmentation performance, namely MVSEC-SEG, which only serves as a test set. The MVSEC-SEG consists of 500 image-event pairs each in ”indoor flying1” and ”outdoor day2” sequences, containing 54,600 masks.

8 papers1 benchmarks

PreviousPage 177 of 1000Next