19,997 machine learning datasets
19,997 dataset results
This dataset can be found on HuggingFace:
SYNS-Patches dataset, which is a subset of SYNS. The original SYNS is composed of aligned image and LiDAR panoramas from 92 different scenes belonging to a wide variety of environments, such as Agriculture, Natural (e.g. forests and fields), Residential, Industrial and Indoor. It represents the subset of patches from each scene extracted at eye level at 20 degree intervals of a full horizontal rotation. This results in 18 images per scene and a total dataset size of 1656.
The Russian Corpus of Linguistic Acceptability (RuCoLA) is built from the ground up under the well-established binary LA approach. RuCoLA consists of 9.8k in-domain sentences from linguistic publications and 3.6k out-of-domain sentence produced by generative models.
ComFact is a benchmark for commonsense fact linking, where models are given contexts and trained to identify situationally-relevant commonsense knowledge from KGs. The novel benchmark, C-om-Fact, contains ∼293k in-context relevance annotations for common-sense triplets across four stylistically diverse dialogue and storytelling datasets.
The evaluation of object detection models is usually performed by optimizing a single metric, e.g. mAP, on a fixed set of datasets, e.g. Microsoft COCO and Pascal VOC. Due to image retrieval and annotation costs, these datasets consist largely of images found on the web and do not represent many real-life domains that are being modelled in practice, e.g. satellite, microscopic and gaming, making it difficult to assert the degree of generalization learned by the model.
FFHQ-UV is a large-scale facial UV-texture dataset that contains over 50,000 high-quality texture UV-maps with even illuminations, neutral expressions, and cleaned facial regions, which are desired characteristics for rendering realistic 3D face models under different lighting conditions. The dataset is derived from FFHQ and preserves the most variations in FFHQ.
HOD is a dataset for 3D object reconstruction which contains 35 objects, divided into two subsets named Sculptures and Daily Objects. The Sculptures has five human sculptures with complex geometries and pure white textures. The Daily Objects consists of 30 daily objects with various shapes and appearances. All of the Sculptures and nine of the Daily Objects are paired with high-fidelity scanned meshes as ground truth geometries for evaluation.
KiloGram is a resource for studying abstract visual reasoning in humans and machines. It contains a richly annotated dataset with >1k distinct stimuli.
The SWINSEG dataset contains 115 nighttime images of sky/cloud patches along with their corresponding binary ground truth maps. The ground truth annotation was done in consultation with experts from Singapore Meteorological Services. All images were captured in Singapore using WAHRSIS, a calibrated ground-based whole sky imager, over a period of 12 months from January to December 2016. All image patches are 500x500 pixels in size, and were selected considering several factors such as time of the image capture, cloud coverage, and seasonal variations.
BRACE is a dataset for audio-conditioned dance motion synthesis challenging common assumptions for this task:
ADVErsarial Table perturbAtion (ADVETA) is a robustness evaluation benchmark featuring natural and realistic ATPs. It is based on three mainstream Text-to-SQL datasets, Spider, WikiSQL and WTQ.
The Distress Analysis Interview Corpus/Wizard-of-Oz set (DAIC-WOZ) dataset [24, 25] comprises voice and text samples from 189 interviewed healthy and control persons and their PHQ-8 depression detection questionnaire. This dataset is commonly used in research works for text-based detection, voice-based detection, and in multi-modal architecture
A medical image segmentation challenge at the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2017. On the SMIR, you can register for the challenge, download the test data and submit your results. For more information, visit the official ISLES homepage under www.isles-challenge.org.
fine-grained location names extraction from disaster-related tweets
HaDes is a token-level, reference-free hallucination detection dataset named HAllucination DEtection dataSet. To create this dataset, a large number of text segments extracted from English language Wikipedia are perturbed, and then verified these with crowd-sourced annotations.
AstroVision is a large-scale dataset comprised of 115,970 densely annotated, real images of 16 different small bodies from both legacy and ongoing deep space missions to facilitate the study of deep learning for autonomous navigation in the vicinity of a small body.
In the BB-norm modality of this task, participant systems had to normalize textual entity mentions according to the OntoBiotope ontology for habitats. See BB-dataset for more information.
In the BB-norm modality of this task, participant systems had to normalize textual entity mentions according to the OntoBiotope ontology for phenotypes. See BB-dataset for more information.
Forecast Sales using ARIMA and SARIMA
Please refer to the following paper which includes a description of the dataset and a link to the dataset and the paper code: