Datasets

19,997 machine learning datasets

19,997 dataset results

HowMany-QA

HowMany-Qa is a object counting dataset. It is taken from the counting-specific union of VQA 2.0 (Goyal et al., 2017) and Visual Genome QA (Krishna et al., 2016).

5 papers2 benchmarksImages, Texts

PQuAD (Persian Question Answering Dataset)

Persian Question Answering Dataset (PQuAD) is a crowdsourced reading comprehension dataset on Persian Wikipedia articles. It includes 80,000 questions along with their answers, with 25% of the questions being adversarially unanswerable.

5 papers0 benchmarksTexts

VCSL (Video Copy Segment Localization)

VCSL (Video Copy Segment Localization) is a new comprehensive segment-level annotated video copy dataset. Compared with existing copy detection datasets restricted by either video-level annotation or small-scale, VCSL not only has two orders of magnitude more segment level labelled data, with 160k realistic video copy pairs containing more than 280k localized copied segment pairs, but also covers a variety of video categories and a wide range of video duration. All the copied segments inside each collected video pair are manually extracted and accompanied by precisely annotated starting and ending timestamps.

5 papers0 benchmarksVideos

DeToxy (DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances)

DeToxy is a publicly available toxicity annotated dataset for the English language. DeToxy is sourced from various openly available speech databases and consists of over 2 million utterances. The dataset would act as a benchmark for the relatively new and un-explored Spoken Language Processing task of detecting toxicity from spoken utterances and boost further research in this space.

5 papers0 benchmarksSpeech

ComPhy (Compositional Physical Reasoning Dataset)

**Compositional Physical Reasoning is a dataset for understanding object-centric and relational physics properties hidden from visual appearances. For a given set of objects, the dataset includes few videos of them moving and interacting under different initial conditions. The model is evaluated based on its capability to unravel the compositional hidden properties, such as mass and charge, and use this knowledge to answer a set of questions posted on one of the videos.

5 papers0 benchmarksVideos

MAG-Scholar-C

MAG-Scholar-C is constructed by Bojchevski et al. based on Microsoft Academic Graph (MAG), in which nodes refer to papers, edges represent citation relations among papers and features are bag-of-words of paper abstracts.

5 papers0 benchmarks

PGDP5K (Plane Geometry Diagram Parsing Dataset)

PGDP5K is a dataset consisting of 5000 diagram samples composed of 16 shapes, covering 5 positional relations, 22 symbol types and 6 text types, labeled with more fine-grained annotations at primitive level, including primitive classes, locations and relationships, where 1,813 non-duplicated images are selected from the Geometry3K dataset and other 3,187 images are collected from three popular textbooks across grades 6-12 on mathematics curriculum websites by taking screenshots from PDF books.

5 papers2 benchmarksImages

Off_Near_sequential

SMAC+ offensive near scenario with sequential episodic buffer

5 papers2 benchmarks

Off_Distant_sequential

SMAC+ offensive distant scenario with sequential episodic buffer

5 papers2 benchmarks

Off_Complicated_sequential

SMAC+ offensive complicated scenario with sequential episodic buffer

5 papers2 benchmarks

Off_Hard_sequential

SMAC+ offensive hard scenario with sequential episodic buffer

5 papers2 benchmarks

Off_Superhard_sequential

SMAC+ offensive superhard scenario with sequential episodic buffer

5 papers2 benchmarks

TCIA 4D-Lung

This data collection consists of images acquired during chemoradiotherapy of 20 locally-advanced, non-small cell lung cancer patients. The images include four-dimensional (4D) fan beam (4D-FBCT) and 4D cone beam CT (4D-CBCT). All patients underwent concurrent radiochemotherapy to a total dose of 64.8-70 Gy using daily 1.8 or 2 Gy fractions. scription of the dataset.

5 papers0 benchmarksBiomedical, Images, Videos

PodcastFillers

The PodcastFillers dataset consists of 199 full-length podcast episodes in English with manually annotated filler words and automatically generated transcripts. The podcast audio recordings, sourced from SoundCloud, are CC-licensed, gender-balanced, and total 145 hours of audio from over 350 speakers. The annotations are provided under a non-commercial license and consist of 85,803 manually annotated audio events including approximately 35,000 filler words (“uh” and “um”) and 50,000 non-filler events such as breaths, music, laughter, repeated words, and noise. The annotated events are also provided as pre-processed 1-second audio clips. The dataset also includes automatically generated speech transcripts from a speech-to-text system. A detailed description is provided in Dataset.

5 papers1 benchmarksSpeech

Flickr-8k

Contains 8k flickr Images with captions. Visit this page to explore the data.

5 papers10 benchmarksImages

WinoGAViL

This dataset is collected via the WinoGAViL game to collect challenging vision-and-language associations. Inspired by the popular card game Codenames, a “spymaster” gives a textual cue related to several visual candidates, and another player has to identify them.

5 papers2 benchmarksImages, Texts

Talking With Hands 16.2M

This is a 16.2-million frame (50-hour) multimodal dataset of two-person face-to-face spontaneous conversations. This dataset features synchronized body and finger motion as well as audio data. It represents the largest motion capture and audio dataset of natural conversations to date. The statistical analysis verifies strong intraperson and interperson covariance of arm, hand, and speech features, potentially enabling new directions on data-driven social behavior analysis, prediction, and synthesis.

5 papers0 benchmarks3D, Speech

HQ-YTVIS

While Video Instance Segmentation (VIS) has seen rapid progress, current approaches struggle to predict high-quality masks with accurate boundary details. To tackle this issue, we identify that the coarse boundary annotations of the popular YouTube-VIS dataset constitute a major limiting factor. To benchmark high-quality mask predictions for VIS, we introduce the HQ-YTVIS dataset as well as Tube-Boundary AP in ECCV 2022. HQ-YTVIS consists of a manually re-annotated test set and our automatically refined training data, which provides training, validation and testing support to facilitate future development of VIS methods aiming at higher mask quality.

5 papers1 benchmarks

Vi-Fi Multi-modal Dataset

A large-scale multi-modal dataset to facilitate research and studies that concentrate on vision-wireless systems. The Vi-Fi dataset is a large-scale multi-modal dataset that consists of vision, wireless and smartphone motion sensor data of multiple participants and passer-by pedestrians in both indoor and outdoor scenarios. In Vi-Fi, vision modality includes RGB-D video from a mounted camera. Wireless modality comprises smartphone data from participants including WiFi FTM and IMU measurements.

5 papers3 benchmarksRGB Video, RGB-D, Time series, Videos

PEN (Problems with Explanations for Numbers)

Provided explanations on the existing three benchmark datasets on solving algebraic word problems: ALG514, DRAW-1K, MAWPS

5 papers4 benchmarks

PreviousPage 221 of 1000Next