Datasets

271 machine learning datasets

271 dataset results

Heteroatom Doped Graphene Supercapacitor

Heteroatom doped graphene supercapacitor feature data is gathered from various literatures for use in machine learning tasks. Main motivation is to optimize supercapacitors and to gain knowledge into models for electrochemistry tasks.

1 papers0 benchmarksTabular

EyeInfo

The EyeInfo Dataset is an open-source eye-tracking dataset created by Fabricio Batista Narcizo, a research scientist at the IT University of Copenhagen (ITU) and GN Audio A/S (Jabra), Denmark. This dataset was introduced in the paper "High-Accuracy Gaze Estimation for Interpolation-Based Eye-Tracking Methods" (DOI: 10.3390/vision5030041). The dataset contains high-speed monocular eye-tracking data from an off-the-shelf remote eye tracker using active illumination. The data from each user has a text file with data annotations of eye features, environment, viewed targets, and facial features. This dataset follows the principles of the General Data Protection Regulation (GDPR).

1 papers0 benchmarksTabular, Texts, Tracking, Videos

Concerns and Value Judgments of Stakeholders in the Non-Fungible Tokens (NFTs) Market (Replication Data for: "Centralized or Decentralized?")

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksTabular, Texts, Time series

Uniswap (Replication Data for: Uniswap Daily Transaction Indices by Network)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksTabular, Time series

Kepler Exoplanet Search Results

Context The Kepler Space Observatory is a NASA-build satellite that was launched in 2009. The telescope is dedicated to searching for exoplanets in star systems besides our own, with the ultimate goal of possibly finding other habitable planets besides our own. The original mission ended in 2013 due to mechanical failures, but the telescope has nevertheless been functional since 2014 on a "K2" extended mission.

1 papers1 benchmarksTabular

Austin Budget Survey Data FY2021 and FY2022

Data collected from two budget surveys (FY2021 in 2020 and FY2022 in 2021) in collaboration with the City of Austin budget department. Data contains preferences for each respondent and the day of their participation.

1 papers0 benchmarksTabular

Participatory Budgeting Preferences Data Set

The data set includes information about 120+ elections (configuration settings and descriptive statistics), projects and 125k+ anonymized voters and their budget preferences. Preferences were sollicited with different elicitation methods (K-approval, knapsack, K-ranking and K-token). For some elections, voters provided also preferences under a secondary elicitation method, resulting in vote pairs from the same voter on the same budgeting question but with a different elicitation method.

1 papers0 benchmarksTabular

BaitBuster-Bangla: A Comprehensive Dataset for Clickbait Detection in Bangla with Multi-Feature and Multi-Modal Analysis

The dataset contains a total of 253,070 records, with 18 features. The features are categorized into four different types: Metadata, Primary Data, Engagement Stats, and Label. Under the Metadata category contains basic information about the channel and video, such as their unique identifiers, date and time of publication, and thumbnail URLs. The Primary Data category contains information about the title and description of the video. The "Processed" columns refer to the cleaned data after denoising, deduplication and debiased for further analysis. The Engagement Stats category contains data on user engagement metrics for each video. The Label category contains predefined auto labels, human annotated labels, and AI generated pseudo labels. Auto labels are labels that are automatically derived based on a review of their titles, descriptions, and thumbnails over time. Channels with consistently misleading, exaggerated, or sensationalized content were labeled as clickbait. Those focusing on

1 papers0 benchmarksTabular, Texts

Genre2Movies (Compositional queries for Movie recommendation)

Genre annotations for movies The file genre2movies.csv contains genre-movie tuples based on Wikidata annotations (https://www.wikidata.org/).

1 papers0 benchmarksGraphs, Ranking, Tabular

GRD-TRT-BUF-4I Technical Validation Data

This is the static test data from the study "Global Geolocated Realtime Data of Interfleet Urban Transit Bus Iding" collected by GRD-TRT-BUF-4I. test-data-a.csv was collected from December 31, 2023 00:01:30 UTC to January 1, 2024 00:01:30 UTC. test-data-b.csv was collected from January 4, 2024 01:30:30 UTC to January 5, 2024 01:30:30 UTC. test-data-c.csv was collected from January 10, 2024 16:05:30 UTC to January 11, 2024 16:05:30 UTC.

1 papers0 benchmarksTabular

Dataset: Privacy-Preserving Gaze Data Streaming in Immersive Interactive Virtual Reality: Robustness and User Experience.

Collected data from two distinct experiments in immersive, interactive VR where participants performed dynamic tasks as their eye, head, and hand movements were recorded. In the second experiment, a range of real-time privacy mechanisms are applied to eye gaze in real-time.

1 papers0 benchmarksTabular, Tracking

fake (Real / Fake Job Posting Prediction)

[Real or Fake] : Fake Job Description Prediction This dataset contains 18K job descriptions out of which about 800 are fake. The data consists of both textual information and meta-information about the jobs. The dataset can be used to create classification models which can learn the job descriptions which are fraudulent.

1 papers1 benchmarksTabular, Texts

Trust Dynamics and Market Behavior in Cryptocurrency (Trust Dynamics and Market Behavior in Cryptocurrency: A Comparative Study of Centralized and Decentralized Exchanges)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksTabular, Time series

YADL (Yet Another Data LAke)

Files composing the YADL data lake, for the paper "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes (Experiment, Analysis & Benchmark Paper)"

1 papers0 benchmarksTabular

Vid2RealHRI online video and results dataset (Community embedded robotics: Vid2RealHRI online video and perceived social intelligence in human-robot encounters dataset)

Introduction This dataset was gathered during the Vid2RealHRI study of humans’ perception of robots' intelligence in the context of an incidental Human-Robot encounter. The dataset contains participants' questionnaire responses to four video study conditions, namely Baseline, Verbal, Body language, and Body language + Verbal. The videos depict a scenario where a pedestrian incidentally encounters a quadruped robot trying to enter a building. The robot uses verbal commands or body language to try to ask for help from the pedestrian in different study conditions. The differences in the conditions were manipulated using the robot’s verbal and expressive movement functionalities.

1 papers0 benchmarksImages, Tabular, Texts, Videos

Healthcare Provider Fraud Detection Analysis

Inpatient claims, Outpatient claims and Beneficiary details of each provider.

1 papers4 benchmarksGraphs, Tabular

World Wide Dishes

We present the World Wide Dishes dataset which seeks to assess disparities in representations of food through a decentralised data collection effort to gather perspectives directly from people with a wide variety of backgrounds from around the globe with the aim of creating a dataset consisting of their insights into their own experiences of foods relevant to their cultural, regional, national, or ethnic lives.

1 papers0 benchmarksImages, Tabular, Texts

Coastal Inundation Maps with Floodwater Depth Values (Simulated Flood Inundation Maps of Abu Dhabi's Coast Under Different Shoreline Protection Scenarios)

This dataset provides simulated flood inundation maps of Abu Dhabi's coast under 174 different shoreline protection scenarios. The maps were produced with a high-fidelity physics-based hydrodynamic simulator under a 0.5-meter sea level rise projection. The details of the hydrodynamic model are reported in [1].

1 papers2 benchmarksImages, Tabular

MedPromptX-VQA

A new in-context visual question answering dataset encompassing interleaved image and EHR data derived from MIMIC-IV and MIMIC-CXR-JPG databases.

1 papers0 benchmarksImages, Medical, Tabular

LoRA-WiSE (LoRA Weight Size Evaluation)

The LoRA Weight Size Evaluation (LoRA-WiSE) is a comprehensive benchmark specifically designed to evaluate LoRA dataset size recovery methods for generative models LoRA-WiSE spans various dataset sizes, backbones, ranks, and personalization sets, as presented in the "Dataset Size Recovery from LoRA Weights"

1 papers0 benchmarksTabular

PreviousPage 10 of 14Next