Datasets

19,997 machine learning datasets

19,997 dataset results

Talk2Nav

Talk2Nav is a large-scale dataset with verbal navigation instructions.

TaoDescribe

The TaoDescribe dataset contains 2,129,187 product titles and descriptions in Chinese.

A new text effects dataset with 141,081 text effect/glyph pairs in total. The dataset consists of 152 professionally designed text effects rendered on glyphs, including English letters, Chinese characters, and Arabic numerals.

2 papers0 benchmarksImages

TicketTalk

A movie ticketing dialog dataset with 23,789 annotated conversations. The movie ticketing conversations range from completely open-ended and unrestricted to more structured, both in terms of their knowledge base, discourse features, and number of turns. In qualitative human evaluations, model-generated responses trained on just 10,000 TicketTalk dialogs were rated to "make sense" 86.5 percent of the time, almost the same as human responses in the same contexts.

2 papers0 benchmarksTexts

Tilde MODEL Corpus (Tilde Multilingual Open Data for European Languages)

Tilde MODEL Corpus is a multilingual corpora for European languages – particularly focused on the smaller languages. The collected resources have been cleaned, aligned, and formatted into a corpora standard TMX format useable for developing new Language technology products and services.

2 papers0 benchmarksTexts

Topology Optimization Dataset

TOP is a synthetic dataset for topology optimization generated using Topy. The generated dataset has 10,000 objects which consist on 100 iterations of the optimization process for the problem defined on a regular 40 x 40 grid.

2 papers0 benchmarks

TTS-Portuguese Corpus

The dataset has 10.5 hours from a single speaker.

2 papers0 benchmarks

Twitch-FIFA

Twitch-FIFA is video-context, many-speaker dialogue dataset based on live-broadcast soccer game videos and chats from Twitch.tv. This dataset can be used to train visually-grounded dialogue models that generate relevant temporal and spatial event language from the live video, while also being relevant to the chat history.

2 papers0 benchmarksTexts, Videos

Twitter Conversations Dataset

This dataset is used for the task of conversational document prediction. The dataset includes conversations that occurred between users and customer care agents in 25 organizations on the Twitter platform. Each conversation ends with a customer care agent providing a URL to a document to resolve the issue the user is facing. The task is to predict the document given a dialog context. The train, dev and test datasets include 10000, 525 and 500 conversations respectively.

2 papers0 benchmarksTexts

UBC3V Dataset

~6 million synthetic depth frames for pose estimation from multiple cameras.

2 papers0 benchmarks

UCLA Protest Image

40,764 images (11,659 protest images and hard negatives) with various annotations of visual attributes and sentiments.

2 papers0 benchmarksImages

UFPR-AMR

This dataset contains 2,000 images taken from inside a warehouse of the Energy Company of Paraná (Copel), which directly serves more than 4 million consuming units in the Brazilian state of Paraná.

2 papers1 benchmarksImages

Virtual Gallery

The Virtual Gallery dataset is a synthetic dataset that targets multiple challenges such as varying lighting conditions and different occlusion levels for various tasks such as depth estimation, instance segmentation and visual localization.

2 papers0 benchmarks

Vistas-NP

The Vistas-NP dataset is an out-of-distribution detection dataset based on the Mapillary Vistas dataset. The original Vistas dataset consists of 18,000 training images and 2,000 validation images with 66 classes. In Vistas-NP the human classes are used as outliers due to their dispersion across scenes and visual diversity from other objects. The dataset is created by excluding all images with class person and the three rider classes to the test subset. Consequently, the dataset has 8,003 train images and 830 validation images. The test set contains 11,167.

2 papers0 benchmarksImages

PreviousPage 305 of 1000Next

Datasets

Talk2Nav

TaoDescribe

TE141K

TicketTalk

Tilde MODEL Corpus (Tilde Multilingual Open Data for European Languages)

Topology Optimization Dataset

TTS-Portuguese Corpus

Twitch-FIFA

Twitter Conversations Dataset

UBC3V Dataset

UCLA Protest Image

UFPR-AMR

Virtual Gallery

Vistas-NP

Visual Relationship Detection Dataset

ViText2SQL

VizWiz-Priv (Visual Privacy dataset)

VizWiz-QualityIssues

Ward2ICU

WikiReading Recycled

Datasets

Talk2Nav

TaoDescribe

TE141K

TicketTalk

Tilde MODEL Corpus (Tilde Multilingual Open Data for European Languages)

Topology Optimization Dataset

TTS-Portuguese Corpus

Twitch-FIFA

Twitter Conversations Dataset

UBC3V Dataset

UCLA Protest Image

UFPR-AMR

Virtual Gallery

Vistas-NP

Visual Relationship Detection Dataset

ViText2SQL

VizWiz-Priv (Visual Privacy dataset)

VizWiz-QualityIssues

Ward2ICU

WikiReading Recycled