Datasets

3,148 machine learning datasets

3,148 dataset results

BillSum

BillSum is the first dataset for summarization of US Congressional and California state bills.

ECHR

ECHR is an English legal judgment prediction dataset of cases from the European Court of Human Rights (ECHR). The dataset contains ~11.5k cases, including the raw text.

47 papers0 benchmarksTexts

MultiCoNER is a large multilingual dataset (11 languages) for Named Entity Recognition. It is designed to represent some of the contemporary challenges in NER, including low-context scenarios (short and uncased text), syntactically complex entities such as movie titles, and long-tail entity distributions.

47 papers0 benchmarksTexts

ScreenSpot

ScreenSpot Evaluation Benchmark ScreenSpot is an evaluation benchmark for GUI grounding, comprising over 1,200 instructions from various environments, including iOS, Android, macOS, Windows, and Web. Each data point includes annotated element types (Text or Icon/Widget). For more details and examples, please refer to our paper.

47 papers1 benchmarksImages, Texts

MAMS (Multi Aspect Multi-Sentiment)

MAMS is a challenge dataset for aspect-based sentiment analysis (ABSA), in which each sentences contain at least two aspects with different sentiment polarities. MAMS dataset contains two versions: one for aspect-term sentiment analysis (ATSA) and one for aspect-category sentiment analysis (ACSA).

46 papers4 benchmarksTexts

CoS-E (Commonsense Explanations Dataset)

CoS-E consists of human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations

46 papers0 benchmarksTexts

UDC (Ubuntu Dialogue Corpus)

Ubuntu Dialogue Corpus (UDC) is a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building dialogue managers based on neural language models that can make use of large amounts of unlabeled data. The dataset has both the multi-turn property of conversations in the Dialog State Tracking Challenge datasets, and the unstructured nature of interactions from microblog services such as Twitter.

46 papers0 benchmarksDialog, Texts

GeoQA (Geometric Question Answering)

GeoQA is a dataset for automatic geometric problem solving containing 5,010 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems

46 papers1 benchmarksTexts

FEVEROUS (Fact Extraction and VERification Over Unstructured and Structured information)

FEVEROUS (Fact Extraction and VERification Over Unstructured and Structured information) is a fact verification dataset which consists of 87,026 verified claims. Each claim is annotated with evidence in the form of sentences and/or cells from tables in Wikipedia, as well as a label indicating whether this evidence supports, refutes, or does not provide enough information to reach a verdict.

46 papers0 benchmarksTexts

LexGLUE

Legal General Language Understanding Evaluation (LexGLUE) benchmark is a collection of datasets for evaluating model performance across a diverse set of legal NLU tasks in a standardized way.

46 papers7 benchmarksTexts

CoSQL (Conversational Text-to-SQL Challenge)

CoSQL is a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions.

45 papers2 benchmarksTexts

DART

DART is a large dataset for open-domain structured data record to text generation. DART consists of 82,191 examples across different domains with each input being a semantic RDF triple set derived from data records in tables and the tree ontology of the schema, annotated with sentence descriptions that cover all facts in the triple set.

45 papers17 benchmarksTexts

Inspired

A new dataset of 1,001 human-human dialogs for movie recommendation with measures for successful recommendations.

45 papers0 benchmarksTexts

MLSUM (MultiLingual SUMmarization)

A large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, Turkish. Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community.

45 papers0 benchmarksTexts

GEM (Generation, Evaluation, and Metrics)

Generation, Evaluation, and Metrics (GEM) is a benchmark environment for Natural Language Generation with a focus on its Evaluation, both through human annotations and automated Metrics.

45 papers0 benchmarksTexts

Geometry3K

A new large-scale geometry problem-solving dataset - 3,002 multi-choice geometry problems - dense annotations in formal language for the diagrams and text - 27,213 annotated diagram logic forms (literals) - 6,293 annotated text logic forms (literals)

45 papers2 benchmarksImages, Texts

RSTPReid (Real Scenario Text-based Person Re-identification)

RSTPReid contains 20505 images of 4,101 persons from 15 cameras. Each person has 5 corresponding images taken by different cameras with complex both indoor and outdoor scene transformations and backgrounds in various periods of time, which makes RSTPReid much more challenging and more adaptable to real scenarios. Each image is annotated with 2 textual descriptions. For data division, 3701 (index < 18505), 200 (18505 <= index < 19505) and 200 (index >= 19505) identities are utilized for training, validation and testing, respectively (Marked by item 'split' in the JSON file). Each sentence is no shorter than 23 words.

45 papers13 benchmarksImages, Texts

SCUT-CTW1500

The SCUT-CTW1500 dataset contains 1,500 images: 1,000 for training and 500 for testing. In particular, it provides 10,751 cropped text instance images, including 3,530 with curved text. The images are manually harvested from the Internet, image libraries such as Google Open-Image, or phone cameras. The dataset contains a lot of horizontal and multi-oriented text.

44 papers6 benchmarksImages, Texts

TurkCorpus

TurkCorpus, a dataset with 2,359 original sentences from English Wikipedia, each with 8 manual reference simplifications. The dataset is divided into two subsets: 2,000 sentences for validation and 359 for testing of sentence simplification models.

44 papers5 benchmarksTexts

EmotionLines

EmotionLines contains a total of 29245 labeled utterances from 2000 dialogues. Each utterance in dialogues is labeled with one of seven emotions, six Ekman’s basic emotions plus the neutral emotion. Each labeling was accomplished by 5 workers, and for each utterance in a label, the emotion category with the highest votes was set as the label of the utterance. Those utterances voted as more than two different emotions were put into the non-neutral category. Therefore the dataset has a total of 8 types of emotion labels, anger, disgust, fear, happiness, sadness, surprise, neutral, and non-neutral.

44 papers0 benchmarksTexts

PreviousPage 20 of 158Next