19,997 machine learning datasets
19,997 dataset results
Multi-modal situated reasoning in 3D scenes
This dataset was collected during a LoRaWAN measurement campaign in a multi-room indoor office environment in the University of Siegen, Germany. It contains over 1.7 million time-stamped records from 6 LoRaWAN nodes transmitting once per minute to a single gateway. Each record includes environmental parameters: temperature, relative humidity, barometric pressure, particulate matter (PM2.5), and carbon dioxide (CO₂); as well as device metadata such as RSSI, SNR, spreading factor (SF), etc. The dataset also includes the effective signal power (ESP) and the noise (NP) for LoRaWAN propagation analysis purposes. The dataset is designed to support research on indoor wireless propagation, distance estimation, environment-aware modeling, among other IoT use cases and applications in line with the 6G flagship demands.
Dataset Card for SVBench This dataset card aims to provide a comprehensive overview of the SVBench dataset, including its purpose, structure, and sources. For details, see our Project, Paper and GitHub repository.
This dataset contains 971 identities from two disjoint camera views. Each identity has two samples per camera view. It is used for Person Re-identification.
3D hand pose data set created using stereo camera
A new multi-view egocentric dataset, Multi-Ego. The dataset is recorded simultaneously by three cameras, covering a wide variety of real-life scenarios. The footage is annotated by multiple individuals under various summarization configurations, with a consensus analysis ensuring a reliable ground truth.
Briefly describe the dataset. Provide:
Paper Field is built from the Microsoft Academic Graph and maps paper titles to one of 7 fields of study. Each field of study - geography, politics, economics, business, sociology, medicine, and psychology - has approximately 12K training examples.
Please find more details of this dataset at https://alex-xun-xu.github.io/ProjectPage/CVPR_18/index.html
The Helsinki Prosody Corpus is a dataset for predicting prosodic prominence from written text. The prosodic annotations are automatically generated, high quality prosodic for the 'clean' subsets of LibriTTS corpus (Zen et al., 2019), comprising of 262.5 hours of read speech from 1230 speakers. The transcribed sentences were aligned and then prosodically annotated with word-level acoustic prominence labels.
The Hotel Arabic-Reviews Dataset (HARD) contains 93700 hotel reviews in Arabic language. The hotel reviews were collected from Booking.com website during June/July 2016. The reviews are expressed in Modern Standard Arabic as well as dialectal Arabic.
An artificial corpus built using grammatical dependencies rules due to the lack of resources for Sign Language.
This collection compiles anonymous radiographs, which have been arbitrarly selected from routine at the Department of Diagnostic Radiology, Aachen University of Technology (RWTH), Aachen, Germany. The imagery represents different ages, genders, view positions and pathologies. Therefore, image quality varies significantly. All images were downscaled to fit into a 512 x 512 bounding box maintaining the original aspect ratio. All images were classified according to the IRMA code. Based on this code, 193 categories were defined. For 12,677 images, these categories are provided. The remaining 1,733 images without code are used as test data for the ImageCLEFmed 2009 competition.
Logical rules are a popular knowledge representation language in many domains. Recently, neural networks have been proposed to support the complex rule induction process. However, we argue that existing datasets and evaluation approaches are lacking in various dimensions; for example, different kinds of rules or dependencies between rules are neglected. Moreover, for the development of neural approaches, we need large amounts of data to learn from and adequate, approximate evaluation measures. In this paper, we provide a tool for generating diverse datasets and for evaluating neural rule learning systems, including novel performance metrics.
From DroneDeploy:
Fact-based Text Editing dataset based on WebNLG dataset.
Fact-based Text Editing dataset based on RotoWire dataset
Pan+ChiPhoto dataset is a Chinese character dataset. It is built by the combination of two datasets: ChiPhoto and Pan_Chinese_Character dataset. The images in this dataset are mainly captured at outdoors in Beijing and Shanghai, China, which involve various scenes like signs, boards, advertisements, banners, objects with texts printed on their surfaces.
The Florentine dataset is a dataset of facial gestures which contains facial clips from 160 subjects (both male and female), where gestures were artificially generated according to a specific request, or genuinely given due to a shown stimulus. 1032 clips were captured for posed expressions and 1745 clips for induced facial expressions amounting to a total of 2777 video clips. Genuine facial expressions were induced in subjects using visual stimuli, i.e. videos selected randomly from a bank of Youtube videos to generate a specific emotion.