19,997 machine learning datasets
19,997 dataset results
The CVL Database is a public database for writer retrieval, writer identification and word spotting. The database consists of 7 different handwritten texts (1 German and 6 Englisch Texts). In total 310 writers participated in the dataset. 27 of which wrote 7 texts and 283 writers had to write 5 texts. For each text a rgb color image (300 dpi) comprising the handwritten text and the printed text sample is available as well as a cropped version (only handwritten). An unique id identifies the writer, whereas the Bounding Boxes for each single word are stored in an XML file.
SSD (Sub-slot Dialog) dataset: This is the dataset for the ACL 2022 paper "A Slot Is Not Built in One Utterance: Spoken Language Dialogs with Sub-Slots".
Tweets from US and Canada (TUSC) is a large dataset of more than 45 million geo-located tweets posted between 2015 and 2021 from US and Canada (TUSC), especially curated for natural language analysis
Subsets of BDD100K Dataset that are used in Object Detection Under Rainy Conditions for Autonomous Vehicles: A Review of State-of-the-Art and Emerging Techniques
Glioblastoma-astrocytoma U373 cells on a polyacrylamide substrate
Cornell-Box Dataset Download The CornellBox Dataset can be downloaded from this URL
the MTHS dataset contains 30Hz PPG signals obtained from 62 patients, including 35 men and 27 women. The ground truth data includes heart rate and oxygen saturation levels sampled at 1Hz. The HR and SPo2 measurement is obtained using a pulse oximeter (M70). An iPhone 5s was used to obtain the ppg recordings at 30 fps.
Contents (As on March 4, 2019) The text corpus contains running text from various free licensed sources. - The whole content of Malayalam Wikipedia extracted on January 1, 2019 - News/Article from various sources, source mentioned in respective files: - 251 Mb - 8,60,159 lines - 98,15,533 words - 10,11,11,885 characters
We present SILVR, a dataset of light field images for six-degrees-of-freedom navigation in large fully-immersive volumes. The SILVR dataset is short for "Synthetic Immersive Large-Volume Ray" dataset.
This database is provided and maintained by Dr. Gregory C Sharp (Harvard Medical School – MGH, Boston) and his group.
Ballistic trajectories
Transaction fee mechanism (TFM) is an essential component of a blockchain protocol. However, a systematic evaluation of the real-world impact of TFMs is still absent. Using rich data from the Ethereum blockchain, mempool, and exchanges, we study the effect of EIP-1559, one of the first deployed TFMs that depart from the traditional first-price auction paradigm. We conduct a rigorous and comprehensive empirical study to examine its causal effect on blockchain transaction fee dynamics, transaction waiting time and security. Our results show that EIP-1559 improves the user experience by making fee estimation easier, mitigating intra-block difference of gas price paid, and reducing users' waiting times. However, EIP-1559 has only a small effect on gas fee levels and consensus security. In addition, we found that when Ether's price is more volatile, the waiting time is significantly higher. We also verify that a larger block size increases the presence of siblings. These findings suggest ne
Archive of Global Tropical Cyclone Tracks Tracks from 1980 to May 2019.
From Schaub, Michael T., et al. "Random walks on simplicial complexes and the normalized hodge 1-laplacian." SIAM Review 62.2 (2020): 353-391.
This dataset contains 63 signed distance function shaders collected mostly from Shadertoy.
IlPost dataset, containing news articles taken from IlPost.
Fanpage dataset, containing news articles taken from Fanpage.
The goal of the challenge is to compare automated algorithms that are able to detect and segment various types of fluids on a common dataset of optical coherence tomography (OCT) volumes representing different retinal diseases, acquired with devices from different manufacturers. We made available a dataset of OCT volumes containing a wide variety of retinal fluid lesions with accompanying reference annotations. We invite the medical imaging community to participate by developing and testing existing and novel automated retinal OCT segmentation methods.
TorWIC is the dataset discussed in POCD: Probabilistic Object-Level Change Detection and Volumetric Mapping in Semi-Static Scenes. The purpose of this dataset is to evaluate the map mainteneance capabilities in a warehouse environment undergoing incremental changes. This dataset is collected in a Clearpath Robotics facility.
The original dataset was provided by Orange telecom in France, which contains anonymized and aggregated human mobility data. The Multivariate-Mobility-Paris dataset comprises information from 2020-08-24 to 2020-11-04 (72 days during the COVID-19 pandemic), with time granularity of 30 minutes and spatial granularity of 6 coarse regions in Paris, France. In other words, it represents a multivariate time series dataset.