19,997 machine learning datasets
19,997 dataset results
This dataset arises from the READ project (Horizon 2020).
The scarcity of high-quality and multi-task singing datasets significantly hinders the development of diverse controllable and personalized singing tasks, as existing singing datasets suffer from low quality, limited diversity of languages and singers, absence of multi-technique information and realistic music scores, and poor task suitability. To tackle these problems, we present GTSinger, a large Global, multi-Technique, free-to-use, high-quality singing corpus with realistic music scores, designed for all singing tasks, along with its benchmarks. Particularly, (1) we collect 80.59 hours of high-quality songs, forming the largest recorded singing dataset; (2) 20 professional singers across nine languages offer diverse timbres and styles; (3) we provide controlled comparison and phoneme-level annotations of six singing techniques, helping technique modeling and control; (4) GTSinger offers realistic music scores, assisting real-world musical composition; (5) singing voices are accompa
Floods are among the most common and devastating natural hazards, imposing immense costs on our society and economy due to their disastrous consequences. Recent progress in weather prediction and spaceborne flood mapping demonstrated the feasibility of anticipating extreme events and reliably detecting their catas- trophic effects afterwards. However, these efforts are rarely linked to one another and there is a critical lack of datasets and benchmarks to enable the direct forecast- ing of flood extent. To resolve this issue, we curate a novel dataset enabling a timely prediction of flood extent. Furthermore, we provide a representative evaluation of state-of-the-art methods, structured into two benchmark tracks for forecasting flood inundation maps i) in general and ii) focused on coastal regions. Altogether, our dataset and benchmark provide a comprehensive platform for evaluating flood forecasts, enabling future solutions for this critical challenge. Data, code & models are shared a
ImageNet-100 is a subset of ImageNet-1k Dataset from ImageNet Large Scale Visual Recognition Challenge 2012. It contains random 100 classes as specified in Labels.json file. Link: https://www.kaggle.com/datasets/ambityga/imagenet100
The ShipSG dataset is the first public dataset of its kind for ship segmentation and georeferencing. The dataset has been made available for the development and evaluation of instance segmentation and georeferencing methods using computer vision and deep learning, thus advancing the research field of ship recognition for maritime situational awareness. ShipSG consists of 3505 images of a port location using two different cameras with static and oblique view. In total, 11625 ship masks grouped in seven ship classes were manually annotated. Moreover, each image contains one geographic annotation of one of the ship masks (latitude and longitude) present in the image.
Median house prices for California districts derived from the 1990 census.
Detecting diffusion-generated images has recently grown into an emerging research area. Existing diffusion-based datasets predominantly focus on general image generation. However, facial forgeries, which pose a more severe social risk, have remained less explored thus far. To address this gap, this paper introduces DiFF, a comprehensive dataset dedicated to face-focused diffusion-generated images. DiFF comprises over 500,000 images that are synthesized using thirteen distinct generation methods under four conditions. In particular, this dataset leverages 30,000 carefully collected textual and visual prompts, ensuring the synthesis of images with both high fidelity and semantic consistency. We conduct extensive experiments on the DiFF dataset via a human test and several representative forgery detection methods. The results demonstrate that the binary detection accuracy of both human observers and automated detectors often falls below 30%, shedding light on the challenges in detecting d
The recent rapid development of Large Vision-Language Models (LVLMs) has indicated their potential for embodied tasks. However, the critical skill of spatial understanding in embodied environments has not been thoroughly evaluated, leaving the gap between current LVLMs and qualified embodied intelligence unknown. Therefore, we construct EmbSpatial-Bench, a benchmark for evaluating embodied spatial understanding of LVLMs. The benchmark is automatically derived from embodied scenes and covers 6 spatial relationships from an egocentric perspective. Below are a few examples.
The unique Spatial Dynamic Wind Power Forecasting dataset: SDWPF, which includes the spatial distribution of wind turbines, as well as the dynamic context factors. Whereas, most of the existing datasets have only a small number of wind turbines without knowing the locations and context information of wind turbines at a fine-grained time scale. By contrast, SDWPF provides the wind power data of 134 wind turbines from a wind farm over half a year with their relative positions and internal statuses.
MuChoMusic is a benchmark designed to evaluate music understanding in multimodal language models focused on audio. It includes 1,187 multiple-choice questions validated by human annotators, based on 644 music tracks from two publicly available music datasets. These questions cover a wide variety of genres and assess knowledge and reasoning across several musical concepts and their cultural and functional contexts. The benchmark provides a holistic evaluation of five open-source models, revealing challenges such as over-reliance on the language modality and highlighting the need for better multimodal integration.
Ultra-high definition benchmark (UHDBench) includes 2293 images at 2k resolution sourced from the ground-truth test sets of HRSOD, LIU4k, UAVid, UHDM, and UHRSD. Images smaller than 2560 × 1440 are excluded.
This dataset is a new benchmark, grounded in real-world usages is developed to support more authentic and comprehensive evaluation of image editing models.
This dataset gathers 428,748 person and 12,236 animal infobox with descriptions based on Wikipedia dump (2018/04/01) and Wikidata (2018/04/12).
A large, realistic multimodal dataset consisting of real personal photos and crowd-sourced questions/answers.
Species-800 is a corpus for species entities, which is based on manually annotated abstracts. It comprises 800 PubMed abstracts that contain identified organism mentions. To increase the corpus taxonomic mention diversity the 800 abstracts were collected by selecting 100 abstracts from the following 8 categories: bacteriology, botany, entomology, medicine, mycology, protistology, virology and zoology. 800 has been annotated with a focus at the species level; however, higher taxa mentions (such as genera, families and orders) have also been considered.
LINNAEUS is a general-purpose dictionary matching software, capable of processing multiple types of document formats in the biomedical domain (MEDLINE, PMC, BMC, OTMI, text, etc.). It can produce multiple types of output (XML, HTML, tab-separated-value file, or save to a database). It also contains methods for acting as a server (including load balancing across several servers), allowing clients to request matching over a network. A package with files for recognizing and identifying species names is available for LINNAEUS, showing 94% recall and 97% precision compared to LINNAEUS-species-corpus.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
6.5 million anonymous ratings of jokes by users of the Jester Joke Recommender System.
Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. 569 samples, 30 features.
Thyroid is a dataset for detection of thyroid diseases, in which patients diagnosed with hypothyroid or subnormal are anomalies against normal patients. It contains 2800 training data instance and 972 test instances, with 29 or so attributes.