Datasets

19,997 machine learning datasets

19,997 dataset results

CVL-DataBase

The CVL Database is a public database for writer retrieval, writer identification and word spotting. The database consists of 7 different handwritten texts (1 German and 6 Englisch Texts). In total 310 writers participated in the dataset. 27 of which wrote 7 texts and 283 writers had to write 5 texts. For each text a rgb color image (300 dpi) comprising the handwritten text and the printed text sample is available as well as a cropped version (only handwritten). An unique id identifies the writer, whereas the Bounding Boxes for each single word are stored in an XML file.

2 papers0 benchmarks

SSD_PHONE (Sub-Slot Dialogue dataset phone domain)

SSD (Sub-slot Dialog) dataset: This is the dataset for the ACL 2022 paper "A Slot Is Not Built in One Utterance: Spoken Language Dialogs with Sub-Slots".

2 papers0 benchmarksTexts

TUSC (Tweets from US and Canada)

Tweets from US and Canada (TUSC) is a large dataset of more than 45 million geo-located tweets posted between 2015 and 2021 from US and Canada (TUSC), especially curated for natural language analysis

2 papers0 benchmarksTexts

BDD100K-Subsets

Subsets of BDD100K Dataset that are used in Object Detection Under Rainy Conditions for Autonomous Vehicles: A Review of State-of-the-Art and Emerging Techniques

2 papers1 benchmarks

PhC-C2DH-U373

Glioblastoma-astrocytoma U373 cells on a polyacrylamide substrate

2 papers4 benchmarks

CB-ToF (Cornell-Box Time-of-Flight Dataset)

Cornell-Box Dataset Download The CornellBox Dataset can be downloaded from this URL

2 papers0 benchmarks

MTHS

the MTHS dataset contains 30Hz PPG signals obtained from 62 patients, including 35 men and 27 women. The ground truth data includes heart rate and oxygen saturation levels sampled at 1Hz. The HR and SPo2 measurement is obtained using a pulse oximeter (M70). An iPhone 5s was used to obtain the ppg recordings at 30 fps.

2 papers6 benchmarksBiomedical, Time series

SMC Text Corpus

Contents (As on March 4, 2019) The text corpus contains running text from various free licensed sources. - The whole content of Malayalam Wikipedia extracted on January 1, 2019 - News/Article from various sources, source mentioned in respective files: - 251 Mb - 8,60,159 lines - 98,15,533 words - 10,11,11,885 characters

2 papers0 benchmarksTexts

SILVR (A Synthetic Immersive Large-Volume Plenoptic Dataset)

We present SILVR, a dataset of light field images for six-degrees-of-freedom navigation in large fully-immersive volumes. The SILVR dataset is short for "Synthetic Immersive Large-Volume Ray" dataset.

2 papers0 benchmarksImages

MICCAI 2015 Head and Neck Challenge

This database is provided and maintained by Dr. Gregory C Sharp (Harvard Medical School – MGH, Boston) and his group.

2 papers1 benchmarksImages

Basketball Ballistic raw sequences

Ballistic trajectories

2 papers0 benchmarks

Replication Data for: "Empirical Analysis of EIP-1559: Transaction Fees, Waiting Time, and Consensus Security"

Transaction fee mechanism (TFM) is an essential component of a blockchain protocol. However, a systematic evaluation of the real-world impact of TFMs is still absent. Using rich data from the Ethereum blockchain, mempool, and exchanges, we study the effect of EIP-1559, one of the first deployed TFMs that depart from the traditional first-price auction paradigm. We conduct a rigorous and comprehensive empirical study to examine its causal effect on blockchain transaction fee dynamics, transaction waiting time and security. Our results show that EIP-1559 improves the user experience by making fee estimation easier, mitigating intra-block difference of gas price paid, and reducing users' waiting times. However, EIP-1559 has only a small effect on gas fee levels and consensus security. In addition, we found that when Ether's price is more volatile, the waiting time is significantly higher. We also verify that a larger block size increases the presence of siblings. These findings suggest ne

2 papers0 benchmarksTabular

Cyclone Data (global cyclone data from 1841 to 2021)

Archive of Global Tropical Cyclone Tracks Tracks from 1980 to May 2019.

2 papers0 benchmarksEnvironment

Ocean Drifters (Madagascar Ocean Drifters)

From Schaub, Michael T., et al. "Random walks on simplicial complexes and the normalized hodge 1-laplacian." SIAM Review 62.2 (2020): 353-391.

2 papers0 benchmarksGraphs

SDF Shader Dataset (A Dataset and Explorer for 3D Signed Distance Functions)

This dataset contains 63 signed distance function shaders collected mostly from Shadertoy.

2 papers0 benchmarks

Abstractive Text Summarization from Il Post

IlPost dataset, containing news articles taken from IlPost.

2 papers8 benchmarks

Abstractive Text Summarization from Fanpage

Fanpage dataset, containing news articles taken from Fanpage.

2 papers10 benchmarks

RETOUCH (RETOUCH -The Retinal OCT Fluid Detection and Segmentation Benchmark and Challenge)

The goal of the challenge is to compare automated algorithms that are able to detect and segment various types of fluids on a common dataset of optical coherence tomography (OCT) volumes representing different retinal diseases, acquired with devices from different manufacturers. We made available a dataset of OCT volumes containing a wide variety of retinal fluid lesions with accompanying reference annotations. We invite the medical imaging community to participate by developing and testing existing and novel automated retinal OCT segmentation methods.

2 papers0 benchmarks3D, Medical

TorWIC (The Toronto Warehouse Incremental Change Dataset)

TorWIC is the dataset discussed in POCD: Probabilistic Object-Level Change Detection and Volumetric Mapping in Semi-Static Scenes. The purpose of this dataset is to evaluate the map mainteneance capabilities in a warehouse environment undergoing incremental changes. This dataset is collected in a Clearpath Robotics facility.

2 papers0 benchmarksLiDAR, RGB-D

Multivariate-Mobility-Paris

The original dataset was provided by Orange telecom in France, which contains anonymized and aggregated human mobility data. The Multivariate-Mobility-Paris dataset comprises information from 2020-08-24 to 2020-11-04 (72 days during the COVID-19 pandemic), with time granularity of 30 minutes and spatial granularity of 6 coarse regions in Paris, France. In other words, it represents a multivariate time series dataset.

2 papers0 benchmarksTables, Tabular

PreviousPage 324 of 1000Next