Datasets

19,997 machine learning datasets

19,997 dataset results

VQA-CE (VQA Counterexamples)

This dataset provides a new split of VQA v2 (similarly to VQA-CP v2), which is built of questions that are hard to answer for biased models.

7 papers1 benchmarksImages, Texts

BiSECT is a dataset for sentence simplification, which is the ability to take a long, complex sentence and split it into shorter sentences, rephrasing as necessary. BiSECT training data consists of 1 million long English sentences paired with shorter, meaning-equivalent English sentences. These were obtained by extracting 1-2 sentence alignments in bilingual parallel corpora and then using machine translation to convert both sides of the corpus into the same language.

7 papers0 benchmarksTexts

TIAGE

TIAGE is a topic-shift aware dialog benchmark constructed utilizing human annotations on topic shifts. Based on TIAGE, three tasks can be conducted to investigate different scenarios of topic-shift modeling in dialog settings: topic-shift detection, topic-shift triggered response generation and topic-aware dialog generation.

7 papers0 benchmarksTexts

Panoptic nuScenes

Panoptic nuScenes is a benchmark dataset that extends the popular nuScenes dataset with point-wise groundtruth annotations for semantic segmentation, panoptic segmentation, and panoptic tracking tasks.

7 papers0 benchmarksImages

CARL (Context Adaptive RL)

CARL (context adaptive RL) provides highly configurable contextual extensions to several well-known RL environments. It's designed to test your agent's generalization capabilities in all scenarios where intra-task generalization is important.

7 papers0 benchmarksEnvironment

Massachusetts Roads Dataset (Road and Building Detection Datasets - Massachusetts Roads Dataset)

The datasets introduced in Chapter 6 of my PhD thesis are below. See the thesis for more details. If you use any of these datasets for research purposes you should use the following citation in any resulting publications:

7 papers9 benchmarks

CodRep

Five curated datasets of one-liner commits from open-source projects. In total, they are composed of 58069 one-liner commits.

7 papers0 benchmarks

NTU RGB+D 2D

NTU RGB+D 2D is a curated version of NTU RGB+D often used for skeleton-based action prediction and synthesis. It contains less number of actions.

7 papers8 benchmarksRGB-D, Videos

LSVTD

LSVTD is a large scale video text dataset for promoting the video text spotting community, which contains 100 text videos from 22 different real-life scenarios. LSVTD covers a wide range of 13 indoor (eg. bookstore, shopping mall) and 9 outdoor scenarios, which is more than 3 times the diversity of IC15.

7 papers0 benchmarksVideos

ACAV100M (Automatically Curated Audio-Visual)

ACAV100M processes 140 million full-length videos (total duration 1,030 years) which are used to produce a dataset of 100 million 10-second clips (31 years) with high audio-visual correspondence. This is two orders of magnitude larger than the current largest video dataset used in the audio-visual learning literature, i.e., AudioSet (8 months), and twice as large as the largest video dataset in the literature, i.e., HowTo100M (15 years).

7 papers0 benchmarksVideos

GINC (Generative IN-Context learning Dataset)

GINC (Generative In-Context learning Dataset) is a small-scale synthetic dataset for studying in-context learning. The pretraining data is generated by a mixture of HMMs and the in-context learning prompt examples are also generated from HMMs (either from the mixture or not). The prompt examples are out-of-distribution with respect to the pretraining data since every example is independent, concatenated, and separated by delimiters. The GitHub repository provides code to generate GINC-style datasets of varying vocabulary sizes, number of HMMs, and other parameters.

7 papers0 benchmarksTexts

LIVE-ETRI (ETRI-LIVE Space-Time Subsampled Video Quality (STSVQ) Database)

The video deployed parameter space is continuously increasing to provide more realistic and immersive experiences to global streaming and social media viewers. However, increments in video parameters such as spatial resolution or frame rate are inevitably associated with larger data volumes. Transmitting increasingly voluminous videos through limited bandwidth networks in a perceptually optimal way is a present challenge affecting billions of viewers. One recent practice adopted by the video service providers is space-time resolution adaptation in conjunction with video compression. Consequently, it is important to understand how different levels of space-time subsampling and compression affect the perceptual quality of videos. Towards making progress in this direction, we constructed a large new resource, called the ETRI-LIVE Space-Time Subsampled Video Quality (ETRI-LIVE-STSVQ) database, containing 437 videos generated by applying various levels of combined space-time subsampling and

7 papers3 benchmarks

BOVText

BOVText is a new large-scale benchmark dataset named Bilingual, Open World Video Text(BOVText), the first large-scale and multilingual benchmark for video text spotting in a variety of scenarios. All data are collected from KuaiShou and YouTube

7 papers0 benchmarks

IWSLT 2017

The IWSLT 2017 translation dataset.

7 papers1 benchmarks

LaFAN1 (Ubisoft La Forge Animation Dataset)

Ubisoft La Forge Animation Dataset ("LAFAN1") Ubisoft La Forge Animation dataset and accompanying code for the SIGGRAPH 2020 paper Robust Motion In-betweening.

7 papers36 benchmarks3D

PhoMT

PhoMT is a high-quality and large-scale Vietnamese-English parallel dataset of 3.02M sentence pairs for machine translation.

7 papers1 benchmarksTexts

The People’s Speech

The People's Speech is a free-to-download 30,000-hour and growing supervised conversational English speech recognition dataset licensed for academic and commercial usage under CC-BY-SA (with a CC-BY subset). The data is collected via searching the Internet for appropriately licensed audio data with existing transcriptions.

7 papers0 benchmarksSpeech

SLAKE-English

English subset of the SLAKE dataset, comprising 642 images and more than 7,000 question–answer pairs.

7 papers0 benchmarksImages, Medical, Texts

ManyTypes4TypeScript

Type Inference dataset for TypeScript. Click on DOI tag for dataset files.

7 papers8 benchmarks

Data Science Problems

Evaluate a natural language code generation model on real data science pedagogical notebooks! Data Science Problems (DSP) includes well-posed data science problems in Markdown along with unit tests to verify correctness and a Docker environment for reproducible execution. About 1/3 of notebooks in this benchmark also include data dependencies, so this benchmark not only can test a model's ability to chain together complex tasks, but also evaluate the solutions on real data! See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant for more details about state of the art results and other properties of the dataset.

7 papers0 benchmarksTexts

PreviousPage 186 of 1000Next