19,997 machine learning datasets
19,997 dataset results
Bongard-OpenWorld is a new benchmark for evaluating real-world few-shot reasoning for machine vision. We hope it can help us better understand the limitations of current visual intelligence and facilitate future research on visual agents with stronger few-shot visual reasoning capabilities.
Context A radio signal consists in two channels, channel I (for 'In phase') and channel Q (for 'Quadrature') and can be assimilated as a stream of complex numbers. It may convey information by coding it as a sequence of symbols sampled from a finite set of complex numbers called a "modulation". There exist several standard modulations such as (non exhaustive list): BPSK, QAM, QPSK of order N, PSK of order N…
CheXphoto is a competition for x-ray interpretation based on a new dataset of naturally and synthetically perturbed chest x-rays hosted by Stanford and VinBrain.
Automatic anomaly detection is critical in today's world where the sheer volume of data makes it impossible to tag outliers manually. The goal of this dataset is to benchmark your anomaly detection algorithm. The dataset consists of real and synthetic time-series with tagged anomaly points. The dataset tests the detection accuracy of various anomaly-types including outliers and change-points. The synthetic dataset consists of time-series with varying trend, noise and seasonality. The real dataset consists of time-series representing the metrics of various Yahoo services.
Sub-Challenge Part of the Endoscopic Vision Challenge
A multilingual dataset for the task of multilingual claim span identification.
In our benchmark WHYSHIFT, we explore distribution shifts on 5 real-world tabular datasets from the economic and traffic sectors with natural spatiotemporal distribution shifts.We only pick 7 typical settings out of 22 settings and select only one representative target domain for each setting. In our benchmark, we specify the distribution shift pattern for each setting, and we provide the tools to identify risky regions with large $Y|X$ shifts and to diagnose the performance degradation.
We introduce FUNSD-r and CORD-r in Token Path Prediction, the revised VrD-NER datasets to reflect the real-world scenarios of NER on scanned VrDs.
We introduce FUNSD-r and CORD-r in Token Path Prediction, the revised VrD-NER datasets to reflect the real-world scenarios of NER on scanned VrDs.
We present the development of a Named Entity Recognition (NER) dataset for Tagalog. This corpus helps fill the resource gap present in Philippine languages today, where NER resources are scarce. The texts were obtained from a pretraining corpora containing news reports, and were labeled by native speakers in an iterative fashion. The resulting dataset contains ~7.8k documents across three entity types: Person, Organization, and Location. The inter-annotator agreement, as measured by Cohen's κ, is 0.81. We also conducted extensive empirical evaluation of state-of-the-art methods across supervised and transfer learning settings. Finally, we released the data and processing code publicly to inspire future work on Tagalog NLP.
VATEX Adverbs is a subset from VATEX with extracted verb-adverb annotations. VATEX Adverbs contains 34 adverbs appearing across 135 actions, forming 1,550 unique action-adverb pairs in 14,617 video clips.
ActivityNet Adverbs is a subset from the ActivityNet dataset with extracted verb-adverb annotations. ActivityNet Adverbs contains 20 adverbs appearing across 114 actions, forming 643 unique action-adverb pairs in 3,099 video clips.
MSR-VTT Adverbs is a subset from MSR-VTT with extracted verb-adverb annotations. MSR-VTT Adverbs contains 18 adverbs appearing across 106 actions, forming 464 unique action-adverb pairs in 1,824 video clips.
Audio-visual question answering aims to answer questions regarding both audio and visual modalities in a given video. For example, given a video showing a traffic intersection where the light turns red and the parking stick drops, and the question “why did the stick fall in the video?”, it requires to combine the visual information “the stick dropping” and the audio information of a train whistle to answer the question as “Here comes the train”. To achieve an accurate reasoning process and get the correct answer, it is essential to extract cues and contexts from both audio and visual modalities and discover their inner causal correlations.
InfantMarmosetsVox is a dataset for multi-class call-type and caller identification. It contains audio recordings of different individual marmosets and their call-types. The dataset contains a total of 350 files of precisely labelled 10-minute audio recordings across all caller classes. The audio was recorded from five pairs of infant marmoset twins, each recorded individually in two separate sound-proofed recording rooms at a sampling rate of 44.1 kHz. The start and end time, call-type, and marmoset identity of each vocalization are provided, labeled by an experienced researcher. A PyTorch Dataloader is included in this dataset.
6981 SAT-level geometry problem with complete natural language description, geometric shapes, formal language annotations, and theorem sequences annotations.
Dataset of restaurant reviews from TripAdvisor that includes images and texts uploaded in reviews by users. Reviews in six different cities are included: Gijón (Spain), Barcelona (Spain), Madrid (Spain), New York City (USA), Paris (France) and London (United Kingdom). In the original publication, the following task is proposed: Can we explain, using the existing image or text from a different user, why a given restaurant was recommended to a certain user?
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
A Large Dataset for Remote Sensing Image Change Captioning. The LEVIR-CC dataset contains 10,077 pairs of bi-temporal remote sensing images and 50,385 sentences describing the differences between images.
OpenCHAIR is a benchmark for evaluating open-vocabulary hallucinations in image captioning models. By leveraging the linguistic knowledge of LLMs, OpenCHAIR is able to perform fine-grained hallucination measurements, as well as significantly increase the amount of objects that can be measured (especially when compared to the existing benchmark, CHAIR). To exploit the LLM's full potential we construct a new dataset by generating 5000 captions with highly diverse objects and let a powerful text-to-image model generate images for them. We find that we are not just able to increase the benchmark's diversity, but also improve the evaluation accuracy with respect to CHAIR's.