271 machine learning datasets
271 dataset results
Heteroatom doped graphene supercapacitor feature data is gathered from various literatures for use in machine learning tasks. Main motivation is to optimize supercapacitors and to gain knowledge into models for electrochemistry tasks.
The EyeInfo Dataset is an open-source eye-tracking dataset created by Fabricio Batista Narcizo, a research scientist at the IT University of Copenhagen (ITU) and GN Audio A/S (Jabra), Denmark. This dataset was introduced in the paper "High-Accuracy Gaze Estimation for Interpolation-Based Eye-Tracking Methods" (DOI: 10.3390/vision5030041). The dataset contains high-speed monocular eye-tracking data from an off-the-shelf remote eye tracker using active illumination. The data from each user has a text file with data annotations of eye features, environment, viewed targets, and facial features. This dataset follows the principles of the General Data Protection Regulation (GDPR).
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Context The Kepler Space Observatory is a NASA-build satellite that was launched in 2009. The telescope is dedicated to searching for exoplanets in star systems besides our own, with the ultimate goal of possibly finding other habitable planets besides our own. The original mission ended in 2013 due to mechanical failures, but the telescope has nevertheless been functional since 2014 on a "K2" extended mission.
Data collected from two budget surveys (FY2021 in 2020 and FY2022 in 2021) in collaboration with the City of Austin budget department. Data contains preferences for each respondent and the day of their participation.
The data set includes information about 120+ elections (configuration settings and descriptive statistics), projects and 125k+ anonymized voters and their budget preferences. Preferences were sollicited with different elicitation methods (K-approval, knapsack, K-ranking and K-token). For some elections, voters provided also preferences under a secondary elicitation method, resulting in vote pairs from the same voter on the same budgeting question but with a different elicitation method.
The dataset contains a total of 253,070 records, with 18 features. The features are categorized into four different types: Metadata, Primary Data, Engagement Stats, and Label. Under the Metadata category contains basic information about the channel and video, such as their unique identifiers, date and time of publication, and thumbnail URLs. The Primary Data category contains information about the title and description of the video. The "Processed" columns refer to the cleaned data after denoising, deduplication and debiased for further analysis. The Engagement Stats category contains data on user engagement metrics for each video. The Label category contains predefined auto labels, human annotated labels, and AI generated pseudo labels. Auto labels are labels that are automatically derived based on a review of their titles, descriptions, and thumbnails over time. Channels with consistently misleading, exaggerated, or sensationalized content were labeled as clickbait. Those focusing on
Genre annotations for movies The file genre2movies.csv contains genre-movie tuples based on Wikidata annotations (https://www.wikidata.org/).
This is the static test data from the study "Global Geolocated Realtime Data of Interfleet Urban Transit Bus Iding" collected by GRD-TRT-BUF-4I. test-data-a.csv was collected from December 31, 2023 00:01:30 UTC to January 1, 2024 00:01:30 UTC. test-data-b.csv was collected from January 4, 2024 01:30:30 UTC to January 5, 2024 01:30:30 UTC. test-data-c.csv was collected from January 10, 2024 16:05:30 UTC to January 11, 2024 16:05:30 UTC.
Collected data from two distinct experiments in immersive, interactive VR where participants performed dynamic tasks as their eye, head, and hand movements were recorded. In the second experiment, a range of real-time privacy mechanisms are applied to eye gaze in real-time.
[Real or Fake] : Fake Job Description Prediction This dataset contains 18K job descriptions out of which about 800 are fake. The data consists of both textual information and meta-information about the jobs. The dataset can be used to create classification models which can learn the job descriptions which are fraudulent.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Files composing the YADL data lake, for the paper "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes (Experiment, Analysis & Benchmark Paper)"
Introduction This dataset was gathered during the Vid2RealHRI study of humans’ perception of robots' intelligence in the context of an incidental Human-Robot encounter. The dataset contains participants' questionnaire responses to four video study conditions, namely Baseline, Verbal, Body language, and Body language + Verbal. The videos depict a scenario where a pedestrian incidentally encounters a quadruped robot trying to enter a building. The robot uses verbal commands or body language to try to ask for help from the pedestrian in different study conditions. The differences in the conditions were manipulated using the robot’s verbal and expressive movement functionalities.
Inpatient claims, Outpatient claims and Beneficiary details of each provider.
We present the World Wide Dishes dataset which seeks to assess disparities in representations of food through a decentralised data collection effort to gather perspectives directly from people with a wide variety of backgrounds from around the globe with the aim of creating a dataset consisting of their insights into their own experiences of foods relevant to their cultural, regional, national, or ethnic lives.
This dataset provides simulated flood inundation maps of Abu Dhabi's coast under 174 different shoreline protection scenarios. The maps were produced with a high-fidelity physics-based hydrodynamic simulator under a 0.5-meter sea level rise projection. The details of the hydrodynamic model are reported in [1].
A new in-context visual question answering dataset encompassing interleaved image and EHR data derived from MIMIC-IV and MIMIC-CXR-JPG databases.
The LoRA Weight Size Evaluation (LoRA-WiSE) is a comprehensive benchmark specifically designed to evaluate LoRA dataset size recovery methods for generative models LoRA-WiSE spans various dataset sizes, backbones, ranks, and personalization sets, as presented in the "Dataset Size Recovery from LoRA Weights"