3,148 machine learning datasets
3,148 dataset results
BillSum is the first dataset for summarization of US Congressional and California state bills.
ECHR is an English legal judgment prediction dataset of cases from the European Court of Human Rights (ECHR). The dataset contains ~11.5k cases, including the raw text.
MultiCoNER is a large multilingual dataset (11 languages) for Named Entity Recognition. It is designed to represent some of the contemporary challenges in NER, including low-context scenarios (short and uncased text), syntactically complex entities such as movie titles, and long-tail entity distributions.
ScreenSpot Evaluation Benchmark ScreenSpot is an evaluation benchmark for GUI grounding, comprising over 1,200 instructions from various environments, including iOS, Android, macOS, Windows, and Web. Each data point includes annotated element types (Text or Icon/Widget). For more details and examples, please refer to our paper.
MAMS is a challenge dataset for aspect-based sentiment analysis (ABSA), in which each sentences contain at least two aspects with different sentiment polarities. MAMS dataset contains two versions: one for aspect-term sentiment analysis (ATSA) and one for aspect-category sentiment analysis (ACSA).
CoS-E consists of human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations
Ubuntu Dialogue Corpus (UDC) is a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building dialogue managers based on neural language models that can make use of large amounts of unlabeled data. The dataset has both the multi-turn property of conversations in the Dialog State Tracking Challenge datasets, and the unstructured nature of interactions from microblog services such as Twitter.
GeoQA is a dataset for automatic geometric problem solving containing 5,010 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems
FEVEROUS (Fact Extraction and VERification Over Unstructured and Structured information) is a fact verification dataset which consists of 87,026 verified claims. Each claim is annotated with evidence in the form of sentences and/or cells from tables in Wikipedia, as well as a label indicating whether this evidence supports, refutes, or does not provide enough information to reach a verdict.
Legal General Language Understanding Evaluation (LexGLUE) benchmark is a collection of datasets for evaluating model performance across a diverse set of legal NLU tasks in a standardized way.
CoSQL is a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions.
DART is a large dataset for open-domain structured data record to text generation. DART consists of 82,191 examples across different domains with each input being a semantic RDF triple set derived from data records in tables and the tree ontology of the schema, annotated with sentence descriptions that cover all facts in the triple set.
A new dataset of 1,001 human-human dialogs for movie recommendation with measures for successful recommendations.
A large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, Turkish. Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community.
Generation, Evaluation, and Metrics (GEM) is a benchmark environment for Natural Language Generation with a focus on its Evaluation, both through human annotations and automated Metrics.
A new large-scale geometry problem-solving dataset - 3,002 multi-choice geometry problems - dense annotations in formal language for the diagrams and text - 27,213 annotated diagram logic forms (literals) - 6,293 annotated text logic forms (literals)
RSTPReid contains 20505 images of 4,101 persons from 15 cameras. Each person has 5 corresponding images taken by different cameras with complex both indoor and outdoor scene transformations and backgrounds in various periods of time, which makes RSTPReid much more challenging and more adaptable to real scenarios. Each image is annotated with 2 textual descriptions. For data division, 3701 (index < 18505), 200 (18505 <= index < 19505) and 200 (index >= 19505) identities are utilized for training, validation and testing, respectively (Marked by item 'split' in the JSON file). Each sentence is no shorter than 23 words.
The SCUT-CTW1500 dataset contains 1,500 images: 1,000 for training and 500 for testing. In particular, it provides 10,751 cropped text instance images, including 3,530 with curved text. The images are manually harvested from the Internet, image libraries such as Google Open-Image, or phone cameras. The dataset contains a lot of horizontal and multi-oriented text.
TurkCorpus, a dataset with 2,359 original sentences from English Wikipedia, each with 8 manual reference simplifications. The dataset is divided into two subsets: 2,000 sentences for validation and 359 for testing of sentence simplification models.
EmotionLines contains a total of 29245 labeled utterances from 2000 dialogues. Each utterance in dialogues is labeled with one of seven emotions, six Ekman’s basic emotions plus the neutral emotion. Each labeling was accomplished by 5 workers, and for each utterance in a label, the emotion category with the highest votes was set as the label of the utterance. Those utterances voted as more than two different emotions were put into the non-neutral category. Therefore the dataset has a total of 8 types of emotion labels, anger, disgust, fear, happiness, sadness, surprise, neutral, and non-neutral.