Datasets

19,997 machine learning datasets

19,997 dataset results

Egohumans

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

Purchase100

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

The benchmark ATC-SMILES is built for ATC classification. ATC-SMILES consists of 4545 compounds/drugs and their SMILES sequences. The benchmark is with the maximum coverage (81.34%) of KEGG dataset which contains all 5588 known drugs/compounds used for ATC analysis. Prior to this benchmark, the most widely adopted one is Chen-2012 which covers 3883 (69.49%) drugs in KEGG and is mainly used for generating inter-drug correlations (e.g. STITCH). The two benchmarks are compared in Table 1. ATC-SMILES is designed to be inclusive to Chen-2012, but there are 2.16% misalignment due to the mismatching of drug IDs that we will explain soon. ATC-SMILES can be extended with new drugs much easier than previous benchmarks as long as the SMILES sequences are available. Trails/experiments are not a must.

2 papers5 benchmarks

Prompted Textures Dataset

The Prompted Textures Dataset (PTD) is a synthetic texture image dataset consisting of 246,285 images across 56 different texture classes from the work On Synthetic Texture Datasets: Challenges, Creation, and Curation. PTD was created with the goal of better quantifying how models learn and respond to texture information when learning object classification or recognition tasks.

2 papers0 benchmarks

Digital twin-supported deep learning for fault diagnosis

This is a dataset used to test deep learning-supported deep learning for fault diagnosis: - A digital twin model for a robot. - A synthetic data from the digital twin to train a deep learning-based fault diagnosis model. - A real dataset collected from the real robot to test the sim-to-real performance. Download the dataset from: https://nextcloud.centralesupelec.fr/s/7AR6aamBZNXcRM8/download

2 papers1 benchmarksTime series

Beacon3D

Dataset of the Beacon3D benchmark: Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis.

2 papers0 benchmarks3D, Texts

Benchmark Sets and Experimental Results for "Parallel Unconstrained Local Search for Partitioning Irregular Graphs"

Contains the graph benchmark sets (regular set and irregular set) and experimental results.

2 papers0 benchmarks

Web-Bench

We developed Web-Bench as a benchmark for evaluating the performance of LLMs on real-world web projects.

2 papers0 benchmarks

MegaScale

Results of a high-throughput biological assay measuring the stability of proteins https://github.com/Rocklin-Lab/cdna-display-proteolysis-pipeline From the paper "Here we present cDNA display proteolysis, a method for measuring thermodynamic folding stability for up to 900,000 protein domains in a one-week experiment. From 1.8 million measurements in total, we curated a set of around 776,000 high-quality folding stabilities covering all single amino acid variants and selected double mutants of 331 natural and 148 de novo designed protein domains 40–72 amino acids in length. " This assay have some particular limitations such as the small size of amino acids, and the particular way stability is measured which is expected to be correlated to typical assays for thermostability . Its advantage is that it contains orders of magnitude more mutants among existing thermostability datasets (typically 100-1000s mutants for tens of different proteins)

2 papers0 benchmarksBiology

TIME (\textsc{TimE}: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers0 benchmarksTexts

Daily-Omni

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers0 benchmarks

WildPPG (WildPPG: A Real-World PPG Dataset of Long Continuous Recordings)

a dataset of multi-modal signals from wearable devices at four sites on the body. Each device continuously recorded synchronized signals from a 3-channel reflective photoplethysmogram (red, green, infrared PPG), 3-axis inertial sensor (accelerometer), temperature, and barometric altitude sensor. For reference, the sternum device continuously recorded a Lead-I electrocardiogram (ECG) from body-mounted gel electrodes to provide ground-truth heart rate (HR) estimates.

2 papers5 benchmarksBiomedical, Time series

ChinaTravel

We introduce ChinaTravel, the first open-ended benchmark grounded in authentic Chinese travel requirements collected from 1,154 human participants. We design a compositionally generalizable domain-specific language (DSL) for scalable evaluation, covering feasibility, constraint satisfaction, and preference comparison.

2 papers0 benchmarksTexts

BarNER

Named entities in Bavarian text

2 papers0 benchmarksTexts

QUITE (Quantifying Uncertainty in natural language Text)

QUITE (Quantifying Uncertainty in natural language Text) is an entirely new benchmark that allows for assessing the capabilities of neural language model-based systems w.r.t. to Bayesian reasoning on a large set of input text that describes probabilistic relationships in natural language text.

2 papers0 benchmarksTexts

ASVspoof 5

ASVspoof 5 is the fifth edition in a series of challenges which promote the study of speech spoofing and deepfake attacks as well as the design of detection solutions. We introduce the ASVspoof 5 database which is generated in a crowdsourced fashion from data collected in diverse acoustic conditions (cf. studio-quality data for earlier ASVspoof databases) and from ~2,000 speakers (cf. ~100 earlier). The database contains attacks generated with 32 different algorithms, also crowdsourced, and optimised to varying degrees using new surrogate detection models. Among them are attacks generated with a mix of legacy and contemporary text-to-speech synthesis and voice conversion models, in addition to adversarial attacks which are incorporated for the first time. ASVspoof 5 protocols comprise seven speaker-disjoint partitions. They include two distinct partitions for the training of different sets of attack models, two more for the development and evaluation of surrogate detection models, and

2 papers0 benchmarks

AutoLand (An Autonomous UAV Navigation and Landing System for Urban Search and Rescue Missions)

To faciliate training of neural networks and evaluation of alternate approaches for landing, we provide a synthetic dataset comprised of collapsed buildings. The dataset consists of 1,281,125 RGB images with corresponding groundtruth for depth, surface normals, semantics and camera pose information. In order to have diverse viewing angles, we varied the tilt of the camera from 0◦ to 55◦ in steps of 5◦, the pan of the camera from 0◦ to 360◦ in steps of 45◦, and we also varied the height of the UAV during data collection from 10 m to 30 m in steps of 5 m. Annotations are provided for the following classes: sky, houses, road, rocks, flora, terrain, trees, cars, and others.

2 papers0 benchmarks

RED-FM

A human-revised dataset for seven languages that allows for the evaluation of multilingual RE systems.

2 papers0 benchmarks

City-Networks

City-Networks, a transductive learning dataset for testing long-range dependencies in Graph Neural Networks (GNNs). In particular, the dataset contains four large-scale city maps: Paris, Shanghai, L.A., and London, where nodes represent intersections and edges represent road segments.

2 papers0 benchmarks

FFSC (Face Forgery in the Semantic Context)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers0 benchmarks

PreviousPage 360 of 1000Next