3,148 machine learning datasets
3,148 dataset results
The AppealCase dataset is the first large-scale resource specifically designed to support LegalAI research in appellate judgment scenarios. While prior work in LegalAI has focused heavily on one-shot trials, the appellate procedure—critical to ensuring fairness and correcting judicial errors—remains largely underexplored.
The dataset contains dialogs of different LLMs from the discussion phase of a text-based Among Us-like game. The phrases in the dataset were annotated according to 25 selected persuasion techniques: Appeal to Logic, Appeal to Emotion, Appeal to Credibility, Shifting the Burden of Proof, Bandwagon Effect, Distraction, Gaslighting, Appeal to Urgency, Deception, Lying, Feigning Ignorance, Vagueness, Minimization, Self-Deprecation, Projection, Appeal to Relationship, Humor, Sarcasm, Withholding Information, Exaggeration, Denial without Evidence, Strategic Voting Suggestion, Appeal to Rules, Confirmation Bias Exploitation, Information Overload. The annotation was performed automatically by few-shot prompting a Gemini Flash 1.5 model with a temperature of 0. On a random sample of 11 games involving a total of 509 persuasion tags, Krippendorff's alpha inter-rater agreement between human annotations and the persuasion tagger was 0.56. For the definitions of the persuasion techniques, please re
KCIF is a benchmark for evaluating the instruction-following capabilities of Large Language Models (LLM). We adapt existing knowledge benchmarks and augment them with instructions that are a) conditional on correctly answering the knowledge task or b) use the space of candidate options in multiple-choice knowledge-answering tasks. KCIF allows us to study model characteristics, such as their change in performance on the knowledge tasks in the presence of answer-modifying instructions and distractor instructions.
In this dataset we teleoperated UR5 arm to collect manipulation data for picking up a screwdriver in a cluttered tabletop environment.
MathEquiv dataset is accompanied to EquivPruner . It is specifically designed for mathematical statement equivalence , serving as a versatile resource applicable to a variety of mathematical tasks and scenarios. It consists of almost 100k math sentences pair with equivalence result and reasoning step generated by GPT-4O.
📊 Dataset Details
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Recognizing complex emotions linked to ambivalence and hesitancy (A/H) can play a critical role in the personalization and effectiveness of digital behaviour change interventions. These subtle and conflicting emotions are manifested by a discord between multiple modalities, such as facial and vocal expressions, and body language. Although experts can be trained to identify A/H, integrating them into digital interventions is costly and less effective. Automatic learning systems provide a cost-effective alternative that can adapt to individual users, and operate seamlessly within real-time, and resource-limited environments. However, there are currently no datasets available for the design of ML models to recognize A/H.
We create the first open-source large-scale S2V generation dataset OpenS2V-5M, which consists of five million high-quality 720P subject-text-video triples. To ensure subject-information diversity in our dataset by, we (1) segmenting subjects and building pairing information via cross-video associations and (2) prompting GPT-Image on raw frames to synthesize multi-view representations. The dataset supports both Subject-to-Video and Text-to-Video generation tasks.
SimpleStories is a dataset of >2 million model-generated short stories. It was made to train small, interpretable language models on it. The generation process is open-source: To see how the dataset was generated, or to generate some stories yourself, head over to https://github.com/lennart-finke/simple_stories_generate.
OOD split of the Mol-Instructions Dataset about Protein Annotation.
real-world doctor-patient question- answering dataset cleaned manually and automatically
real-world doctor-patient question- answering dataset
a 90 million token medical corpus crawled from medical websites
persian translation of K-QA dataset
WebGen-Bench WebGen-Bench is created to benchmark LLM-based agent's ability to generate websites from scratch. The dataset is introduced in WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch. It contains 101 instructions and 647 test cases. It also has a training set of 6667 instructions, named WebGen-Instruct.
The first and the one open dataset for Russian finger- spelling, contained 1,593 annotated phrases and over 37 thousand HD+ videos.
PubMedQA-MetaGen: Metadata-Enriched PubMedQA Corpus
Enumerate–Conjecture–Prove: Formally Solving Answer-Construction Problem in Math Competitions We release the ConstructiveBench dataset as part of our Enumerate–Conjecture–Prove (ECP) paper. It enables benchmarking automated reasoning systems on answer-construction math problems using Lean 4.
A collection of test sets for evaluating base and chat LLMs (incl. VLMs) on Greek generation and understanding capabilities.