Datasets

19,997 machine learning datasets

19,997 dataset results

KnowIT VQA

KnowIT VQA is a video dataset with 24,282 human-generated question-answer pairs about The Big Bang Theory. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the series to be answered.

9 papers0 benchmarksTexts, Videos

LoDoPaB-CT

LoDoPaB-CT is a dataset of computed tomography images and simulated low-dose measurements. It contains over 40,000 scan slices from around 800 patients selected from the LIDC/IDRI Database.

9 papers1 benchmarksImages, Medical

LogoDet-3K

A logo detection dataset with full annotation, which has 3,000 logo categories, about 200,000 manually annotated logo objects and 158,652 images. LogoDet-3K creates a more challenging benchmark for logo detection, for its higher comprehensive coverage and wider variety in both logo categories and annotated objects compared with existing datasets.

9 papers0 benchmarks

MEIR (Multimodal Entity Image Repurposing)

MEIR is a substantially challenging dataset over that which has been previously available to support research into image repurposing detection. The new dataset includes location, person, and organization manipulations on real-world data sourced from Flickr.

9 papers0 benchmarksImages

MSASL-1000

MSASL is a real-life large-scale sign language data set comprising over 25,000 annotated videos.

9 papers2 benchmarksVideos

NCLS (Neural Cross-Lingual Summarization Corpora)

Presents two high-quality large-scale CLS datasets based on existing monolingual summarization datasets.

9 papers0 benchmarks

OmniArt

Presents half a million samples and structured meta-data to encourage further research and societal engagement.

9 papers1 benchmarks

PathTrack

PathTrack is a dataset for person tracking which contains more than 15,000 person trajectories in 720 sequences.

9 papers0 benchmarksTracking, Videos

PEC (Persona-Based Empathetic Conversational)

A novel large-scale multi-domain dataset for persona-based empathetic conversations.

9 papers0 benchmarks

PTB-TIR

PTB-TIR is a Thermal InfraRed (TIR) pedestrian tracking benchmark, which provides 60 TIR sequences with mannuly annoations. The benchmark is used to fair evaluate TIR trackers.

9 papers0 benchmarksVideos

QuickDraw-Extended

Consists of 330,000 sketches and 204,000 photos spanning across 110 categories.

9 papers0 benchmarksImages

RAVEN-FAIR

RAVEN-FAIR is a modified version of the RAVEN dataset.

9 papers0 benchmarksTexts

ReCO

A human-curated ChineseReading Comprehension dataset on Opinion. The questions in ReCO are opinion based queries issued to the commercial search engine. The passages are provided by the crowdworkers who extract the support snippet from the retrieved documents.

9 papers0 benchmarks

ReDWeb-S

ReDWeb-S is a large-scale challenging dataset for Salient Object Detection. It has totally 3179 images with various real-world scenes and high-quality depth maps. The dataset is split into a training set with 2179 RGB-D image pairs and a testing set with the remaining 1000 image pairs.

9 papers0 benchmarksImages

PreviousPage 161 of 1000Next

Datasets

KnowIT VQA

LoDoPaB-CT

LogoDet-3K

MEIR (Multimodal Entity Image Repurposing)

MSASL-1000

NCLS (Neural Cross-Lingual Summarization Corpora)

OmniArt

PathTrack

PEC (Persona-Based Empathetic Conversational)

PTB-TIR

QuickDraw-Extended

RAVEN-FAIR

ReCO

ReDWeb-S

SelQA

SOBA (Shadow-OBject Association)

SPEECH-COCO

SQuADShifts

Standardized Project Gutenberg Corpus

TSAC (Tunisian Sentiment Analysis Corpus)

Datasets

KnowIT VQA

LoDoPaB-CT

LogoDet-3K

MEIR (Multimodal Entity Image Repurposing)

MSASL-1000

NCLS (Neural Cross-Lingual Summarization Corpora)

OmniArt

PathTrack

PEC (Persona-Based Empathetic Conversational)

PTB-TIR

QuickDraw-Extended

RAVEN-FAIR

ReCO

ReDWeb-S

SelQA

SOBA (Shadow-OBject Association)

SPEECH-COCO

SQuADShifts

Standardized Project Gutenberg Corpus

TSAC (Tunisian Sentiment Analysis Corpus)