Datasets

19,997 machine learning datasets

19,997 dataset results

NYTWIT

A collection of over 2,500 novel English words published in the New York Times between November 2017 and March 2019, manually annotated for their class of novelty (such as lexical derivation, dialectal variation, blending, or compounding).

2 papers0 benchmarks

Occ-Traj120

Occ-Traj120 is a trajectory dataset that contains occupancy representations of different local-maps with associated trajectories. This dataset contains 400 locally-structured maps with occupancy representation and roughly around 120K trajectories in total.

2 papers0 benchmarks

OCR-VQA

The OCR-VQA dataset is a valuable resource for research in the field of Visual Question Answering (VQA). Let me provide you with some details about it:

2 papers0 benchmarks

ODMS (Object Depth via Motion and Segmentation)

ODMS is a dataset for learning Object Depth via Motion and Segmentation. ODMS training data are configurable and extensible, with each training example consisting of a series of object segmentation masks, camera movement distances, and ground truth object depth. As a benchmark evaluation, the dataset provides four ODMS validation and test sets with 15,650 examples in multiple domains, including robotics and driving.

2 papers0 benchmarksImages

Oktoberfest Food Dataset

A realistic, diverse, and challenging dataset for object detection on images. The data was recorded at a beer tent in Germany and consists of 15 different categories of food and drink items.

2 papers1 benchmarks

OpenLORIS-object

(L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition Dataset (OpenLORIS-Object) is designed for accelerating the lifelong/continual/incremental learning research and application，currently focusing on improving the continuous learning capability of the common objects in the home scenario.

2 papers0 benchmarksImages

ORGaze

A new video dataset for OR, with 30, 000 objects over 5, 000 stereo video sequences annotated for their descriptions and gaze.

2 papers0 benchmarksVideos

Parallel Meaning Bank

The Parallel Meaning Bank (PMB), developed at the University of Groningen and building upon the Groningen Meaning Bank, comprises sentences and texts in raw and tokenised format, syntactic analysis, word senses, thematic roles, reference resolution, and formal meaning representations. The main objective of the PMB is to provide fine-grained meaning representations for words, sentences and texts. Sentences are, in isolation, often ambiguous. The aim is to provide the most likely interpretation for a sentence, with a minimal use of underspecification.

2 papers0 benchmarks

Parkinson's Pose Estimation Dataset

The data includes all movement trajectories extracted from the videos of Parkinson's assessments using Convolutional Pose Machines (CPM) as well as the confidence values from CPM. The dataset also includes ground truth ratings of parkinsonism and dyskinesia severity using the UDysRS, UPDRS, and CAPSIT.

2 papers0 benchmarksImages, Videos

Photi-LakeIce

A new benchmark dataset of webcam images, Photi-LakeIce, from multiple cameras and two different winters, along with pixel-wise ground truth annotations.

2 papers0 benchmarks

pioNER

The pioNER corpus provides gold-standard and automatically generated named-entity datasets for the Armenian language. The automatically generated corpus is generated from Wikipedia. The gold-standard set is a collection of over 250 news articles from iLur.am with manual named-entity annotation. It includes sentences from political, sports, local and world news, and is comparable in size with the test sets of other languages.

2 papers0 benchmarksTexts

PoC (Points of correspondence)

A dataset containing the documents, source and fusion sentences, and human annotations of points of correspondence between sentences. The dataset bridges the gap between coreference resolution and summarization.

2 papers0 benchmarks

PoKi

PoKi is a corpus of 61,330 poems written by children from grades 1 to 12. PoKi is especially useful in studying child language because it comes with information about the age of the child authors (their grade).

2 papers0 benchmarksTexts

PolSF

Collects five open polarimetric SAR images, which are images of the San Francisco area. These five images come from different satellites at different times, which has great scientific research value.

2 papers0 benchmarks

PubFig (Public Figures Face Database)

The PubFig database is a large, real-world face dataset consisting of 58,797 images of 200 people collected from the internet. Unlike most other existing face datasets, these images are taken in completely uncontrolled situations with non-cooperative subjects. Thus, there is large variation in pose, lighting, expression, scene, camera, imaging conditions and parameters, etc. The PubFig dataset is similar in spirit to the Labeled Faces in the Wild (LFW) dataset.

2 papers0 benchmarksImages

Pump and dump dataset

The Pump and dump dataset is an annotated set of messages to detect cryptocurrency market manipulations. It consists of a list of a list of pump and dumps arranged by groups on Telegram. All the pump and dumps in the dataset are on the trading pair SYM/BTC.

2 papers0 benchmarksTexts

RAD (RELEVANCE AND DIVERSITY DATASET)

The dataset is useful for query-adaptive video summarization and annotated with diversity and query-specific relevance labels.

2 papers0 benchmarks

RainNet

RainNet is a real (non-simuated) large-scale spatial precipitation downscaling dataset that contains 62,424 pairs of low-resolution and high-resolution precipitation maps for 17 years. Contrary to simulated data, this real dataset covers various types of real meteorological phenomena (e.g., Hurricane, Squall, etc.), and shows the physical characters - Temporal Misalignment, Temporal Sparse and Fluid Properties - that challenge the downscaling algorithms.

2 papers0 benchmarks

Rendered Handpose Dataset

Rendered Handpose Dataset contains 41258 training and 2728 testing samples. Each sample provides:

2 papers0 benchmarks3D, Images, RGB-D

ReviewQA

ReviewQA is a question-answering dataset based on hotel reviews. The questions of this dataset are linked to a set of relational understanding competencies that a model is expected to master. Indeed, each question comes with an associated type that characterizes the required competency.

2 papers0 benchmarksTexts

PreviousPage 303 of 1000Next