Datasets

19,997 machine learning datasets

19,997 dataset results

Evalmuse-40k

Recently, Text-to-Image (T2I) generation models have achieved significant advancements. Correspondingly, many automated metrics have emerged to evaluate the image-text alignment capabilities of generative models. However, the performance comparison among these automated metrics is limited by existing small datasets. Additionally, these datasets lack the capacity to assess the performance of automated metrics at a fine-grained level. In this study, we contribute an EvalMuse-40K benchmark, gathering 40K image-text pairs with fine-grained human annotations for image-text alignment-related tasks. In the construction process, we employ various strategies such as balanced prompt sampling and data re-annotation to ensure the diversity and reliability of our benchmark.

2 papers0 benchmarks

Multi-behavior Taobao

Data from the popular Chinese online shopping platform Taobao includes behaviors like buy, add-to-cart, add-to-favorite, and pageview. The buying behavior is considered the target behavior.

2 papers1 benchmarks

SynthEVox3D-Tiny (Synthetic Event Camera Voxel 3D Reconstruction Dataset)

Event cameras are sensors that are inspired by biological systems and specialize in capturing changes in brightness. These emerging cameras offer numerous advantages over conventional frame-based cameras, including high dynamic range, high frame rates, and extremely low power consumption. As a result, event cameras are increasingly being used in various fields, such as object detection and tracking, autonomous driving, 3D reconstruction, visual odometry, and SLAM.

2 papers3 benchmarksImages, Videos

CoNLL-2020 (CoNLLpp)

A test dataset that annotated articles in 2020 following the CoNLL-2003 NER task.

2 papers1 benchmarksTexts

RoFT-chatgpt

RoFT-chatgpt is a variation of RoFT dataset, where the same human prompts are continued with the gpt-3.5-turbo model. Each dataset sample consists of ten sentences, with the first part written by a human and the remainder completed by an LLM. Consequently, every sample has a boundary indicating the index of the sentence where authorship changes.

2 papers2 benchmarks

SingleWordProductionDutch-iBIDS

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers0 benchmarks

Insider Threat Test Dataset

The Insider Threat Test Dataset is a collection of synthetic insider threat test datasets that provide both background and malicious actor synthetic data.

2 papers1 benchmarksTabular

Image-based Confounding Dataset

Replication Data for: Integrating Earth Observation Data into Causal Inference: Challenges and Opportunities

2 papers0 benchmarksImages, Tabular

MaNGA (Mapping Nearby Galaxies at APO)

MaNGA is a component of the Fourth-Generation Sloan Digital Sky Survey whose goal is to map the detailed composition and kinematic structure of nearby galaxies. MaNGA uses integral field unit (IFU) spectroscopy to measure spectra for hundreds of points within each galaxy. MaNGA’s goal is to understand the “life history” of present-day galaxies from imprinted clues of their birth and assembly, through their ongoing growth via star formation and merging, to their death from quenching at late times.

2 papers0 benchmarksHyperspectral images

Waldo and Wenda

A benchmark for Human-Human Interaction (HHI) recognition as free text.

2 papers0 benchmarks

U-DIADS-Bib

U-DIADS-Bib is a proprietary dataset developed through the collaboration of computer scientists and humanities at the University of Udine. It is composed of 200 images, 50 for each of the 4 different manuscripts that characterize it. These handwritten books were selected in collaboration with humanist partners considering both the complexity of their layout and the presence of significant and semantically distinguishable elements. In particular, the images of the four manuscripts were collected from the digital library Gallica. All manuscripts are Latin and Syriac Bibles published between the 6th and 12th centuries A.D.

2 papers2 benchmarksImages

ADP Dataset

The ADP dataset consists of over 200,000 experimental crystal structures curated from the Cambridge Structural Database (CSD). It focuses on Anisotropic Displacement Parameters (ADPs), which describe atomic thermal vibrations within crystal lattices. ADPs provide insights into material properties such as thermal motion, heat capacity, vibrational entropy, and thermal expansion.

2 papers0 benchmarks

Amazon MTPP (Marked Temporal Point Processes on Amazon data)

The dataset includes time-stamped user product reviews behavior from January, 2008 to October, 2018. Each user has a sequence of produce review events with each event containing the timestamp and category of the reviewed product, with each category corresponding to an event type.

2 papers4 benchmarksTabular, Time series

StackOverflow MTPP (Marked Temporal Point Processes on StackOverflow data)

The dataset has two years of user awards on a question-answering website: each user received a sequence of badges and there are 22 different kinds of badges in total.

2 papers4 benchmarksTabular, Time series

AgeGroup Transactions MTPP (Marked Temporal Point Processes on financial transactions data)

The dataset contains historical financial transactions, including time, category and cost fields. There are 50000 clients, 205 categories and 43.7M events. The original goal was to predict the age group of the client. In this variant of the dataset, the goal is to forecast multiple future events.

2 papers4 benchmarksTabular, Time series

CPU

The CPU dataset, first introduced by Rahimi and Recht (2007) and then used by Balog et al (2016).

2 papers0 benchmarks

American Sign Language Dataset

About Dataset Step right up to our AI data collection company, where we’ve got something special just for you: a unique set of American Sign Language datasets! These datasets are carefully curated to give your AI projects a powerful boost.

2 papers0 benchmarks

Database of axial impact simulations of the crash box (Database for crashworthiness optimisation)

This repository contains the database of the FEM simulation of axially impacted various configurations of the square crash boxes. This database records the impact of the structural and crash test parameters on the various crashworthiness objectives.

2 papers0 benchmarksTables, Tabular

Planetarium

145k natural language and PDDL problem pairs from the Blocks World, Gripper, and Floor Tile domains.

2 papers0 benchmarksTexts

Paragraph Expanded

To take advantage of the ever-increasing amount of structural data now available, we also trained Paragraph on a larger dataset. This new dataset was extracted from the Structural Antibody Database (SAbDab, Schneider et al., 2022) on March 31, 2022 and includes 1086 complexes which we divide into train, validation and test sets using a 60-20-20 split. Full details of both datasets are given in the Supplementary Information.

2 papers4 benchmarksBiology, Biomedical

PreviousPage 357 of 1000Next