Datasets

19,997 machine learning datasets

19,997 dataset results

Violin (VIdeO-and-Language INference)

Video-and-Language Inference is the task of joint multimodal understanding of video and text. Given a video clip with aligned subtitles as premise, paired with a natural language hypothesis based on the video content, a model needs to infer whether the hypothesis is entailed or contradicted by the given video clip. The Violin dataset is a dataset for this task which consists of 95,322 video-hypothesis pairs from 15,887 video clips, spanning over 582 hours of video. These video clips contain rich content with diverse temporal dynamics, event shifts, and people interactions, collected from two sources: (i) popular TV shows, and (ii) movie clips from YouTube channels.

18 papers0 benchmarksImages, Texts

CLEVR-Humans

We collect a new dataset of human-posed free-form natural language questions about CLEVR images. Many of these questions have out-of-vocabulary words and require reasoning skills that are absent from our model’s repertoire

18 papers1 benchmarksImages, Texts

LIVE (Public-Domain Subjective Image Quality Database)

The LIVE Public-Domain Subjective Image Quality Database is a resource developed by the Laboratory for Image and Video Engineering at the University of Texas at Austin. It contains a set of images and videos whose quality has been ranked by human subjects. This database is used in Quality Assessment (QA) research, which aims to make quality predictions that align with the subjective opinions of human observers.

18 papers0 benchmarksImages

VidSitu

VidSitu is a dataset for the task of semantic role labeling in videos (VidSRL). It is a large-scale video understanding data source with 29K 10-second movie clips richly annotated with a verb and semantic-roles every 2 seconds. Entities are co-referenced across events within a movie clip and events are connected to each other via event-event relations. Clips in VidSitu are drawn from a large collection of movies (∼3K) and have been chosen to be both complex (∼4.2 unique verbs within a video) as well as diverse (∼200 verbs have more than 100 annotations each).

18 papers0 benchmarksVideos

xSID (Cross-lingual Slot and Intent Detection)

xSID, a new evaluation benchmark for cross-lingual (X) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect, covering Arabic (ar), Chinese (zh), Danish (da), Dutch (nl), English (en), German (de), Indonesian (id), Italian (it), Japanese (ja), Kazakh (kk), Serbian (sr), Turkish (tr) and an Austro-Bavarian German dialect, South Tyrolean (de-st).

18 papers0 benchmarksTexts

Enron Emails

This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.

18 papers3 benchmarksTexts

CICIDS2017 (Intrusion Detection Evaluation Dataset (CIC-IDS2017))

Intrusion Detection Evaluation Dataset (CIC-IDS2017) Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) are the most important defense tools against the sophisticated and ever-growing network attacks. Due to the lack of reliable test and validation datasets, anomaly-based intrusion detection approaches are suffering from consistent and accurate performance evolutions.

18 papers7 benchmarks

X-CSQA

X-CSQA is a multilingual dataset for Commonsense reasoning research, based on CSQA.

18 papers0 benchmarksTexts

PASTIS (Panoptic Segmentation of satellite image TImes Series)

PASTIS is a benchmark dataset for panoptic and semantic segmentation of agricultural parcels from satellite image time series. It is composed of 2433 one square kilometer-patches in the French metropolitan territory for which sequences of satellite observations are assembled into a four-dimensional spatio-temporal tensor. The dataset contains both semantic and instance annotations, assigning to each pixel a semantic label and an instance id. There is an official 5 fold split provided in the dataset's metadata.

18 papers15 benchmarksImages

MultiBench

MultiBench, a systematic and unified large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers evaluation methodology to study (1) generalization, (2) time and space complexity, and (3) modality robustness.

18 papers0 benchmarks

EXTREME CLASSIFICATION (Extreme Multi-label Classification)

The objective in extreme multi-label classification is to learn feature architectures and classifiers that can automatically tag a data point with the most relevant subset of labels from an extremely large label set. This repository provides resources that can be used for evaluating the performance of extreme multi-label algorithms including datasets, code, and metrics.

18 papers0 benchmarksTexts

TUM-VIE (TUM Stereo Visual-Inertial Event Dataset)

TUM-VIE is an event camera dataset for developing 3D perception and navigation algorithms. It contains handheld and head-mounted sequences in indoor and outdoor environments with rapid motion during sports and high dynamic range. TUM-VIE includes challenging sequences where state-of-the art VIO fails or results in large drift. Hence, it can help to push the boundary on event-based visual-inertial algorithms.

18 papers0 benchmarks3D, Images

WTW (Wired Table in the Wild)

WTW (Wired Table in the Wild) is a large-scale dataset which includes well-annotated structure parsing of multiple style tables in several scenes like the photo, scanning files, web pages.

18 papers1 benchmarksImages

LIVECell (Label-free In Vitro image Examples of Cells)

The LIVECell (Label-free In Vitro image Examples of Cells) dataset is a large-scale microscopic image dataset for instance-segmentation of individual cells in 2D cell cultures.

18 papers10 benchmarksBiology, Biomedical, Images

RAFT (Realworld Annotated Few-shot Tasks)

The RAFT benchmark (Realworld Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment.

18 papers36 benchmarks

Bridge Data

Bridge Data is a large multi-domain and multi-task dataset, with 7,200 demonstrations constituting 71 tasks across 10 environments. The dataset is collected using a low-cost yet versatile 6-DoF WidowX250 robot arm and contains 7,200 demonstrations of a robot performing 71 kitchen tasks across 10 environments with varying lighting, robot positions, and backgrounds. It can be used to boosting generalization of robotic skills and empirically study how it can improve the learning of new tasks in new environments.

18 papers0 benchmarksEnvironment

SCICAP

SCICAP is a large-scale image captioning dataset that contains real-world scientific figures and captions. SCICAP was constructed using more than two million images from over 290,000 papers collected and released by arXiv.

18 papers1 benchmarksImages, Texts

AI-TOD (Tiny Object Detection in Aerial Images)

AI-TOD comes with 700,621 object instances for eight categories across 28,036 aerial images. Compared to existing object detection datasets in aerial images, the mean size of objects in AI-TOD is about 12.8 pixels, which is much smaller than others.

18 papers47 benchmarksImages

BOBSL (BBC-Oxford British Sign Language)

BOBSL is a large-scale dataset of British Sign Language (BSL). It comprises 1,962 episodes (approximately 1,400 hours) of BSL-interpreted BBC broadcast footage accompanied by written English subtitles. From horror, period and medical dramas, history, nature and science documentaries, sitcoms, children’s shows and programs covering cooking, beauty, business and travel, BOBSL covers a wide range of topics. The dataset features a total of 39 signers. Distinct signers appear in the training, validation and test sets for signer-independent evaluation.

18 papers1 benchmarksVideos

KoNViD-1k (KoNViD-1k VQA Database)

Subjective video quality assessment (VQA) strongly depends on semantics, context, and the types of visual distortions. A lot of existing VQA databases cover small numbers of video sequences with artificial distortions. When testing newly developed Quality of Experience (QoE) models and metrics, they are commonly evaluated against subjective data from such databases, that are the result of perception experiments. However, since the aim of these QoE models is to accurately predict natural videos, these artificially distorted video databases are an insufficient basis for learning. Additionally, the small sizes make them only marginally usable for state-of-the-art learning systems, such as deep learning. In order to give a better basis for development and evaluation of objective VQA methods, we have created a larger datasets of natural, real-world video sequences with corresponding subjective mean opinion scores (MOS) gathered through crowdsourcing. We took YFCC100m as a baseline databas

18 papers3 benchmarksVideos

PreviousPage 110 of 1000Next