19,997 machine learning datasets
19,997 dataset results
This dataset contains images of individual hand-written Bengali characters. Bengali characters (graphemes) are written by combining three components: a grapheme_root, vowel_diacritic, and consonant_diacritic. Your challenge is to classify the components of the grapheme in each image. There are roughly 10,000 possible graphemes, of which roughly 1,000 are represented in the training set. The test set includes some graphemes that do not exist in the train but has no new grapheme components. It takes a lot of volunteers filling out sheets like this to generate a useful amount of real data; focusing the problem on the grapheme components rather than on recognizing whole graphemes should make it possible to assemble a Bengali OCR system without handwriting samples for all 10,000 graphemes.
The SmartSpeaker benchmark tests the performance of reacting to music player commands in English as well as in French. It has the difficulty of containing many artist or music tracks with uncommon names in the commands, like “play music by [a boogie wit da hoodie]” or “I’d like to listen to [Kinokoteikoku]”.
BRIND is a short name of BSDS-RIND is the first public benchmark that dedicated to studying simultaneously the four edge types, namely Reflectance Edge (RE), Illumination Edge (IE), Normal Edge (NE) and Depth Edge (DE)
Diagnostic Evaluation of Video Inpainting on Landscapes (DEVIL) benchmark is composed of a curated video/occlusion mask dataset and a comprehensive evaluation scheme
Large-scale and open-access LiDAR dataset intended for the evaluation of real-time semantic segmentation algorithms. In contrast to other large-scale datasets, HelixNet includes fine-grained data about the sensor's rotation and position, as well as the points' release time.
Urban is one of the most widely used hyperspectral data used in the hyperspectral unmixing study. There are 307x307 pixels, each of which corresponds to a 2x2 m2 area. In this image, there are 210 wavelengths ranging from 400 nm to 2500 nm, resulting in a spectral resolution of 10 nm. After the channels 1-4, 76, 87, 101-111, 136-153 and 198-210 are removed (due to dense water vapor and atmospheric effects), we remain 162 channels (this is a common preprocess for hyperspectral unmixing analyses). There are three versions of ground truth, which contain 4, 5 and 6 endmembers respectively, which are introduced in the ground truth.
According to the WHO, World report on vision 2019, the number of visually impaired people worldwide is estimated to be 2.2 billion, of whom at least 1 billion have a vision impairment that could have been prevented or is yet to be addressed. The world faces considerable challenges in terms of eye care, including inequalities in the coverage and quality of prevention, treatment, and rehabilitation services. Early detection and diagnosis of ocular pathologies would enable forestall of visual impairment. One challenge that limits the adoption of a computer-aided diagnosis tool by the ophthalmologist is, the sight-threatening rare pathologies such as central retinal artery occlusion or anterior ischemic optic neuropathy and others are usually ignored. In the past two decades, many publicly available datasets of color fundus images have been collected with a primary focus on diabetic retinopathy, glaucoma, and age-related macular degeneration, and few other frequent pathologies. The challe
OSAI introduces OpenTTGames - an open dataset aimed at evaluation of different computer vision tasks in Table Tennis: ball detection, semantic segmentation of humans, table and scoreboard and fast in-game events spotting.
The AQI dataset is collected from 12 observing stations around Beijing from year 2013 to 2017. The data is accessible at The University of California, Irvine (UCI) Machine Learning Repository.
Spanish TimeBank 1.0 was developed by researchers at Barcelona Media and consists of Spanish texts in the AnCora corpus annotated with temporal and event information according to the TimeML specification language.
Modeling what makes an advertisement persuasive, i.e., eliciting the desired response from consumer, is critical to the study of propaganda, social psychology, and marketing. Despite its importance, computational modeling of persuasion in computer vision is still in its infancy, primarily due to the lack of benchmark datasets that can provide persuasion-strategy labels associated with ads. Motivated by persuasion literature in social psychology and marketing, we introduce an extensive vocabulary of persuasion strategies and build the first ad image corpus annotated with persuasion strategies. The dataset also provides image segmentation masks, which labels persuasion strategies in the corresponding ad images on the test split.
UzWordnet is a lexical-semantic database, or a “word-net”, for the (Northern) Uzbek language (native: O’zbek till) compatible with Princeton Wordnet. By providing it open source (see License), we aim to motivate, support, and increase the application of database and knowledge graphs principles and techniques to the study of computational aspects of the (Northern) Uzbek language and, more generally, the usability of Uzbek within IT applications and the Internet.
This article describes the first emotional corpus, named EMOVO, applicable to Italian language,. It is a database built from the voices of up to 6 actors who played 14 sentences simulating 6 emotional states (disgust, fear, anger, joy, surprise, sadness) plus the neutral state. These emotions are the well-known Big Six found in most of the literature related to emotional speech. The recordings were made with professional equipment in the Fondazione Ugo Bordoni laboratories. The paper also describes a subjective validation test of the corpus, based on emotion-discrimination of two sentences carried out by two different groups of 24 listeners. The test was successful because it yielded an overall recognition accuracy of 80%. It is observed that emotions less easy to recognize are joy and disgust, whereas the most easy to detect are anger, sadness and the neutral state.
MDIA is a large-scale multilingual benchmark for dialogue generation. It covers real-life conversations in 46 languages across 19 language families.
Two single cell datsets for 3D shape reconstruction from 2D microscopy images used for our three previous publication’s, together with the respective model predictions.
We present a new large-scale photorealistic panoramic dataset named FutureHouse, which has the following characteristics. 1) It contains over 70,000 high-quality models with high-resolution meshes and physical material. All models are measured in real world standards. 2) Selected scene layouts are carefully designed by over 100 excellent artists. All of selected layouts are used in realworld display. 3) It contains 28,579 good panoramic views from 1,752 house-scale scenes. Therefore, it can be used for perspective image tasks as well as omnidirectional image tasks. 4) More physical material representation. Most materials are represent by microfacet BRDF modeling metalness, and the rest are represent by special shading models, e.g., cloth material and transmission material. 5) High rendering quality. Benefiting from commercial rendering engine, Unreal engine 4, and powerful deep learning super sampling (DLSS), our renderings have less noise. Our SVBRDF rep
It is a freely available resource for research on handling negation and uncertainty in biomedical texts . The corpus consists of three parts, namely medical free texts,biological full papers and biological scientific abstracts. The dataset contains annotations at the token level for negative and speculative keywords and at the sentence level for their linguistic scope. The annotation process was carried out by two independent linguist annotators and a chief annotator – also responsible for setting up the annotation guidelines – who resolved cases where the annotators disagreed.
An autnonomous driving dataset and benchmark for optical flow. This dataset was created by the Heidelberg Collaboratory for Image Processing in close cooperation with Robert Bosch GmbH.
Synthehicle is a massive CARLA-based synthehic multi-vehicle multi-camera tracking dataset and includes ground truth for 2D detection and tracking, 3D detection and tracking, depth estimation, and semantic, instance and panoptic segmentation.
"Our dataset consists of 70 melanoma and 100 naevus images from the digital image archive of the Department of Dermatology of the University Medical Center Groningen (UMCG) used for the development and testing of the MED-NODE system for skin cancer detection from macroscopic images. The file - complete_mednode_dataset.zip 24KB - contains 170 images (70 melanoma and 100 nevi cases)."