Datasets

3,148 machine learning datasets

3,148 dataset results

MVME (Multi-View Medical Evaluation Benchmark)

The benchmark assesses the real-time interactive consultation capabilities of LLMs across three critical dimensions. We collect Chinese medical records across diverse departments online.

1 papers0 benchmarksTexts

CausalGym

SyntaxGym, adapted for interventional interpretability.

1 papers1 benchmarksTexts

HatefulDiscussions

Multi-Modal Hate Speech Detection with Graph Context.

1 papers0 benchmarksGraphs, Images, Texts

open-compass/CriticBench

[Dataset on HF] [Project Page] [Subjective LeaderBoard] [Objective LeaderBoard]

1 papers0 benchmarksTexts

prompt-opin-summ

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksTexts

opin-pref

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksTexts

Educational Grade School Math (EGSM)

Educational Grade School Math (EGSM) contains 2,093 question/answer pairs generated by MATHWELL, a reference-free educational grade school math word problem generator that outputs a word problem and Program of Thought (PoT) solution based solely on an optional student interest, as introduced in MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations. The question/answer pairs are verified by human experts. EGSM is the first teacher-annotated math word problem training dataset for LLMs.

1 papers0 benchmarksTexts

MATHWELL Human Annotation Dataset

The MATHWELL Human Annotation Dataset contains 5,084 synthetic word problems and answers generated by MATHWELL, a reference-free educational grade school math word problem generator released in MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations, and comparison models (GPT-4, GPT-3.5, Llama-2, MAmmoTH, and LLEMMA) with expert human annotations for solvability, accuracy, appropriateness, and meets all criteria (MaC). Solvability means the problem is mathematically possible to solve, accuracy means the Program of Thought (PoT) solution arrives at the correct answer, appropriateness means that the mathematical topic is familiar to a grade school student and the question's context is appropriate for a young learner, and MaC denotes questions which are labeled as solvable, accurate, and appropriate. Null values for accuracy and appropriateness indicate a question labeled as unsolvable, which means it cannot have an accurate solution and is automatically inappropria

1 papers0 benchmarksTexts

LEARNING STYLE IDENTIFICATION (Learning Style Identification Using Semi-Supervised Self-Taught Labeling)

The dataset was collected from two courses offered on the University of Jordan's E-learning Portal during the second semester of 2020, namely "Computer Skills for Humanities Students" (CSHS) and "Computer Skills for Medical Students" (CSMS). Over the sixteen-week duration of each course, students participated in various activities such as reading materials, video lectures, assignments, and quizzes. To preserve student privacy, the log activity of each student was anonymized. Data was aggregated from multiple sources, including the Moodle learning management system and the student information system, and consolidated into a single database. The dataset contains information on the number of learners and events for each course, as well as their launch and end dates. CSHS had 1749 learners and 1,139,810 events from January 21, 2020 to May 20, 2020, while CSMS had 564 learners and 484,410 events during the same period. The dataset is based on the Filder and Silverman learning style model (F

1 papers0 benchmarksActions, Texts, Tracking

BaitBuster-Bangla: A Comprehensive Dataset for Clickbait Detection in Bangla with Multi-Feature and Multi-Modal Analysis

The dataset contains a total of 253,070 records, with 18 features. The features are categorized into four different types: Metadata, Primary Data, Engagement Stats, and Label. Under the Metadata category contains basic information about the channel and video, such as their unique identifiers, date and time of publication, and thumbnail URLs. The Primary Data category contains information about the title and description of the video. The "Processed" columns refer to the cleaned data after denoising, deduplication and debiased for further analysis. The Engagement Stats category contains data on user engagement metrics for each video. The Label category contains predefined auto labels, human annotated labels, and AI generated pseudo labels. Auto labels are labels that are automatically derived based on a review of their titles, descriptions, and thumbnails over time. Channels with consistently misleading, exaggerated, or sensationalized content were labeled as clickbait. Those focusing on

1 papers0 benchmarksTabular, Texts

EpiK-Eval (Epistemic Knowledge Evaluation)

Benchmark to evaluate the capability of LMs to consolidate and recall information from multiple training documents.

1 papers0 benchmarksTexts

MMOS

Mix of Minimal Optimal Sets (MMOS) of dataset has two advantages for two aspects, higher performance and lower construction costs on math reasoning.

1 papers0 benchmarksTexts

Intent-based user instruction for electric automation

For the purpose of training and evaluating our intent classification model for electric automation, we curated a dataset consisting of intent-based user instructions. The dataset comprises a total of 14 intents, each associated with approximately 10 user instructions, resulting in a total of 140 instructions for electric automation. The intents were carefully selected to cover a diverse range of control commands and actions commonly encountered in electric automation scenarios. These intents include commands for turning on/off electrical appliances. Each user instruction in the dataset is labeled with its corresponding intent, allowing the model to learn the mapping between input instructions and their intended actions.

1 papers0 benchmarksTexts

Google Local review (Google Local Data)

Description This Dataset contains review information on Google map (ratings, text, images, etc.), business metadata (address, geographical info, descriptions, category information, price, open hours, and MISC info), and links (relative businesses) up to Sep 2021 in the United States.

1 papers0 benchmarksImages, Texts

ViMATH (Vietnamese MATH)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksTexts

ViSR (Vietnamese Synthetic Reasoning)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksTexts

Synthetic Reasoning - Natural Language (Vietnamese Synthetic Reasoning - Natural Language)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksTexts

AI-generated Twitter Timelines

This project contains instructions and codes to reconstruct a dataset for the development and evaluation of forensic tools for detecting machine-generated text in social media.

1 papers0 benchmarksTexts

AlgoPuzzleVQA

We introduce the novel task of multimodal puzzle solving, framed within the context of visual question-answering. We present a new dataset, AlgoPuzzleVQA designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles that necessitate both visual understanding, language understanding, and complex algorithmic reasoning. We create the puzzles to encompass a diverse array of mathematical and algorithmic topics such as boolean logic, combinatorics, graph theory, optimization, search, etc., aiming to evaluate the gap between visual data interpretation and algorithmic problem-solving skills. The dataset is generated automatically from code authored by humans. All our puzzles have exact solutions that can be found from the algorithm without tedious human calculations. It ensures that our dataset can be scaled up arbitrarily in terms of reasoning complexity and dataset size. Our investigation reveals that large language models (LLMs) such as GPT

1 papers1 benchmarksImages, Texts

HistGen WSI-Report Dataset

This dataset is composed of 7,753 pairs of whole slide images and their corresponding diagnostic reports, extracted from the TCGA platform and refined with large language models. This dataset aims to boost the field of automated histopathology report generation by providing a new publicly available evaluation benchmark. See HistGen paper (see https://arxiv.org/pdf/2403.05396.pdf for reference) for a more detailed description of this dataset.

1 papers1 benchmarksImages, Texts

PreviousPage 133 of 158Next