Datasets

19,997 machine learning datasets

19,997 dataset results

Bone Age (The RSNA Pediatric Bone Age Machine Learning Challenge)

At RSNA 2017 there was a contest to correctly identify the age of a child from an X-ray of their hand.

MPHOI-72 (Multi-person Human-object Interaction Dataset 72)

MPHOI-72 is a multi-person human-object interaction dataset that can be used for a wide variety of HOI/activity recognition and pose estimation/object tracking tasks. The dataset is challenging due to many body occlusions among the humans and objects. It consists of 72 videos captured from 3 different angles at 30 fps, with totally 26,383 frames and an average length of 12 seconds. It involves 5 humans performing in pairs, 6 object types, 3 activities and 13 sub-activities. The dataset includes color video, depth video, human skeletons, human and object bounding boxes.

2 papers0 benchmarks

https://sourceforge.net/adobe/adobedatasets/panoramas/home/Home/ (adobe panorama dataset)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers0 benchmarks

MIMIC PERform Testing Dataset

The MIMIC PERform Testing dataset contains the following physiological signals recorded from 200 critically-ill patients during routine clinical care:

2 papers10 benchmarksBiomedical, Medical, Time series

HuTics (Human Deictic Gestures Dataset)

HuTics contains 2040 images showing how humans use deictic gestures to interact with various daily-life objects. The images are annotated by segmentation masks of the object(s) of interest. The original purpose of the data collection is for gesture-aware object-agnostic segmentation tasks.

2 papers0 benchmarksImages

An Extension of XNLI

https://github.com/salesforce/xnli_extension

2 papers0 benchmarks

EBB!

This dataset contains around 5K pairs of aligned images captured using Canon 70D DSLR with low and high apertures, modeling normal photos and photos with bokeh (blur) effect. The height of each image in the dataset is 1024 pixels, the width varies over the images.

2 papers0 benchmarks

Aurora-2

The Aurora-2 data are based on a version of the original TIDigits (as available from LDC) downsampled at 8 kHz. Different noise signals have been artificially added to clean speech data. The software tool for filtering and noise adding is available in the download area. You can use the tool for creating distorted data at sampling rates of 8 or 16 kHz. The recognition experiments for Aurora-2 are based on the usage of the HTK recognizer as it is available from Cambridge University. Scripts and configuration files are part of the Aurora-2 CDs as they are distributed by ELRA/ELDA. A published paper is available describing some details of the data creation and the recognition experiments.

2 papers0 benchmarks

Basque TimeBank

A set of basque documents annotated with EusTimeML - a mark-up language for temporal information in Basque.

2 papers3 benchmarksTexts

Catalan TimeBank 1.0

Catalan TimeBank 1.0 was developed by researchers at Barcelona Media and consists of Catalan texts in the AnCora corpus annotated with temporal and event information according to the TimeML specification language.

2 papers3 benchmarksTexts

MentSum (Mental Health Summarization Dataset)

Mental health remains a significant challenge of public health worldwide. With increasing popularity of online platforms, many use the platforms to share their mental health conditions, express their feelings, and seek help from the community and counselors. While posts are of varying length, it is beneficial to provide a short, but informative summary for fast processing by the counselors. To facilitate research in summarization of mental health online posts, we introduce Mental Health Summarization dataset, MentSum, containing over 24k carefully selected user posts from Reddit, along with their short user-written summary (called TLDR) in English from 43 mental health subreddits.

2 papers3 benchmarksTexts

Line Coverage Dataset

The dataset contains road networks taken from 50 most populous cities in the world. The road networks are obtained using OpenStreetMap. These road networks are used to benchmark routing algorithms on graphs.

2 papers0 benchmarks

Scholars on Twitter

This is a dataset of paired OpenAlex author_ids (https://docs.openalex.org/about-the-data/author) and tweeter_id.

2 papers0 benchmarks

Rhythmic Gymnastic

The Rhythmic Gymnastics dataset contains videos of four different types of gymnastics routines: ball, clubs, hoop and ribbon. Each type of routine has 250 associated videos, and the length of each video is approximately 1 min 35 s. We chose high-standard international competition videos, including videos from the 36th and 37th International Artistic Gymnastics Competitions, to construct the dataset. We have edited out the irrelevant parts of the original videos (such as replay shots and athlete warmups). We have annotated each video with three scores (a difficulty score, an execution score and a total score), which were given by the referee in accordance with the official scoring system.

2 papers1 benchmarksVideos

NMC Li-ion Battery Cathode Energies and Charge Densities

This dataset contains charge densities for NMC (Ni, Mn and Co) 2x2x1 supercell (12 transition metal atoms and 12 Li/vacancy site) with varying levels of Li content. For each structure we first randomly sample the number of Mn, Ni and Co atoms given that the total number of transition metal atoms is 12 and then randomly assign them to the transition metal positions of the lattice. Similarly the number of vacancies is uniformly sampled between 0 and 12 and vacancies are assigned to the Li site. The generated configurations are then relaxed in two steps: First we relax the atom positions with fixed cell parameters and then we allow both positions and cell parameters to relax. We keep only the electron density (CHGCAR) file after the last cell relaxation step. The atoms are relaxed until forces on each atom are lower than 0.01 eV/Å.

2 papers0 benchmarks3D

Unified SSL Benchmark (USB)

The Unified SSL Benchmark (USB) consists of 15 diverse, challenging, and comprehensive tasks from CV, natural language processing (NLP), and audio processing (Audio) to evaluate self-supervised learning (SSL) methods. A modular and extensible codebase is open-sourced for fair evaluation on these SSL methods.

2 papers0 benchmarks

Norwegian Endurance Athlete ECG Database

Abstract The Norwegian Endurance Athlete ECG Database contains 12-lead ECG recordings from 28 elite athletes from various sports in Norway. All recordings are 10 seconds resting ECGs recorded with a General Electric (GE) MAC VUE 360 electrocardiograph. All ECGs are interpreted with both the GE Marquette SL12 algorithm (version 23 (v243)) and one cardiologist with training in interpretation of athlete's ECG. The data was collected at the University of Oslo in February and March 2020.

2 papers0 benchmarksBiomedical, Medical, Time series

SC_burst (Smartphone burst Dataset)

Contains16 burst images using smartphones for burst/video denoising, restoration, and enhancement tasks. The raw format are unified and saved SC_burst in ".MAT", where the raw data and metadata are stored.

2 papers0 benchmarks

Psychometric NLP

Psychometric NLP is a corpus for psychometric natural language processing (NLP) related to important dimensions such as trust, anxiety, numeracy, and literacy, in the health domain. The dataset aligns user text with their survey-based response items and encompasses survey-based psychometric measures, accompanying user-generated text, and self-reported demographic information, including race, sex, age, income, and education from 8,502 respondents.

2 papers0 benchmarks

Eth-Exchange (Exchange in Ethereum)

The sampled 2-hop subgraphs centered on Exchange accounts on the Ethereum Interaction graph.

2 papers0 benchmarks

PreviousPage 328 of 1000Next