Datasets

19,997 machine learning datasets

19,997 dataset results

KaggleDBQA (KaggleDBQA: Realistic Text-to-SQL dataset)

KaggleDBQA is a challenging cross-domain and complex evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions.

28 papers2 benchmarksTexts

KAIST (High-quality hyperspectral reconstruction using a spectral prior)

High-quality hyperspectral reconstruction using a spectral prior

28 papers4 benchmarks

SafetyBench is a comprehensive benchmark designed to evaluate the safety of large language models (LLMs) using multiple-choice questions. As LLMs become increasingly prevalent, concerns about their safety have grown. SafetyBench addresses this by providing a reliable evaluation framework for researchers and developers. Here are the key points about SafetyBench:

28 papers0 benchmarks

ShareGPT4Video

The ShareGPT4Video dataset is a large-scale resource designed to improve video understanding and generation¹. It features 1.2 million highly descriptive captions⁴ for video clips, surpassing existing datasets in diversity and information content⁴. The captions cover a wide range of aspects, including world knowledge, object properties, spatial relationships, and aesthetic evaluations⁴.

28 papers0 benchmarks

THuman2.0 Dataset

THuman2.0 Dataset contains 500 high-quality human scans captured by a dense DLSR rig. For each scan, we provide the 3D model (.obj) and the corresponding texture map (.jpeg). Image Source: Original Paper

28 papers4 benchmarks3D, Images, RGB-D

XM 3600 (Crossmodal 3600)

Research in massively multilingual image captioning has been severely hampered by a lack of high-quality evaluation datasets. In this paper we present the Crossmodal-3600 dataset (XM3600 in short), a geographically-diverse set of 3600 images annotated with human-generated reference captions in 36 languages. The images were selected from across the world, covering regions where the 36 languages are spoken, and annotated with captions that achieve consistency in terms of style across all languages, while avoiding annotation artifacts due to direct translation. We apply this benchmark to model selection for massively multilingual image captioning models, and show strong correlation results with human evaluations when using XM3600 as golden references for automatic metrics.

28 papers0 benchmarksImages, Texts

MP20 (Metastable crystal structures from Materials Project)

MP20 (Xie et al., 2022) contains 45,231 metastable crystal structures from the Materials Project (Jain et al., 2013), each with up to 20 atoms and spanning 89 different element types.

28 papers2 benchmarks

AFAD (Asian Face Age Dataset)

The Asian Face Age Dataset (AFAD) is a new dataset proposed for evaluating the performance of age estimation, which contains more than 160K facial images and the corresponding age and gender labels. This dataset is oriented to age estimation on Asian faces, so all the facial images are for Asian faces. It is noted that the AFAD is the biggest dataset for age estimation to date. It is well suited to evaluate how deep learning methods can be adopted for age estimation.

27 papers6 benchmarks

LDC2017T10 (Abstract Meaning Representation (AMR) Annotation Release 2.0)

Abstract Meaning Representation (AMR) Annotation Release 2.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado's Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 39,260 English natural language sentences from broadcast conversations, newswire, weblogs and web discussion forums.

27 papers2 benchmarksGraphs, Texts

20 Newsgroups

The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.

27 papers9 benchmarksTexts

NoW Benchmark

The goal of this benchmark is to introduce a standard evaluation metric to measure the accuracy and robustness of 3D face reconstruction methods under variations in viewing angle, lighting, and common occlusions.

27 papers15 benchmarks3d meshes, Images

Silhouettes (CalTech 101 Silhouettes)

The Caltech 101 Silhouettes dataset consists of 4,100 training samples, 2,264 validation samples and 2,307 test samples. The datast is based on CalTech 101 image annotations. Each image in the CalTech 101 data set includes a high-quality polygon outline of the primary object in the scene. To create the CalTech 101 Silhouettes data set, the authors center and scale each outline and render it on a DxD pixel image-plane. The outline is rendered as a filled, black polygon on a white background. Many object classes exhibit silhouettes that have distinctive class-specific features. A relatively small number of classes like soccer ball, pizza, stop sign, and yin-yang are indistinguishable based on shape, but have been left-in in the data.

27 papers0 benchmarksImages

ISIC 2018 Task 1

The ISIC 2018 dataset was published by the International Skin Imaging Collaboration (ISIC) as a large-scale dataset of dermoscopy images. This Task 1 dataset is the challenge on lesion segmentation. It includes 2594 images.

27 papers1 benchmarksImages, Medical

Vid4

The Vid4 dataset is generally used for testing video super-resolution. It consists of four sequences: walk (740x480, 47 frames), foliage (740x480, 49 frames), city (704x576, 34 frames), and calendar (720x576, 41 frames).

27 papers0 benchmarksVideos

CONAN (COunter NArratives through Nichesourcing)

COunter NArratives through Nichesourcing (CONAN) is a dataset that consists of 4,078 pairs over the 3 languages. Additionally, 3 types of metadata are provided: expert demographics, hate speech sub-topic and counter-narrative type. The dataset is augmented through translation (from Italian/French to English) and paraphrasing, which brought the total number of pairs to 14.988.

27 papers0 benchmarksTexts

2D-3D Match Dataset

2D-3D Match Dataset is a new dataset of 2D-3D correspondences by leveraging the availability of several 3D datasets from RGB-D scans. Specifically, the data from SceneNN and 3DMatch are used. The training dataset consists of 110 RGB-D scans, of which 56 scenes are from SceneNN and 54 scenes are from 3DMatch. The 2D-3D correspondence data is generated as follows. Given a 3D point which is randomly sampled from a 3D point cloud, a set of 3D patches from different scanning views are extracted. To find a 2D-3D correspondence, for each 3D patch, its 3D position is re-projected into all RGB-D frames for which the point lies in the camera frustum, taking occlusion into account. The corresponding local 2D patches around the re-projected point are extracted. In total, around 1.4 millions 2D-3D correspondences are collected.

27 papers0 benchmarksImages

ChaosNLI

Chaos NLI is a Natural Language Inference (NLI) dataset with 100 annotations per example (for a total of 464,500 annotations) for some existing data points in the development sets of SNLI, MNLI, and Abductive NLI. The dataset provides additional labels for NLI annotations that reflect the distribution of human annotators, instead of picking the majority label as the gold standard label.

27 papers0 benchmarksTexts

Multi-Modal CelebA-HQ

Multi-Modal-CelebA-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has high-quality segmentation mask, sketch, descriptive text, and image with transparent background.

27 papers7 benchmarksImages, Texts

DeeperForensics-1.0

DeeperForensics-1.0 represents the largest face forgery detection dataset by far, with 60,000 videos constituted by a total of 17.6 million frames, 10 times larger than existing datasets of the same kind. The full dataset includes 48,475 source videos and 11,000 manipulated videos. The source videos are collected on 100 paid and consented actors from 26 countries, and the manipulated videos are generated by a newly proposed many-to-many end-to-end face swapping method, DF-VAE. 7 types of real-world perturbations at 5 intensity levels are employed to ensure a larger scale and higher diversity. Image Source: https://github.com/EndlessSora/DeeperForensics-1.0

27 papers0 benchmarksImages, Videos

PreviousPage 85 of 1000Next