Datasets

3,148 machine learning datasets

3,148 dataset results

MILU (Multi-task Indic Language Understanding Benchmark)

Overview MILU (Multi-task Indic Language Understanding Benchmark) is a comprehensive evaluation dataset designed to assess the performance of Large Language Models (LLMs) across 11 Indic languages. It spans 8 domains and 42 subjects, reflecting both general and culturally specific knowledge from India.

1 papers0 benchmarksTexts

Google Ranked URLs Dataset

This dataset was curated for Search Engine Optimization (SEO) analysis tasks, including categorization and spam detection. It covers 12 diverse topics: basketball, books, cats, gardening, history, movies, music, recipes, sports, technology, travel, and weather. Some topics have hierarchical relationships, such as sports and basketball, while others are closely related (e.g., movies and music) or unrelated (e.g., basketball and gardening), with varying degrees of overlap among them. For each topic, approximately 300 search queries were generated using large language models (LLMs) like GPT, Llama, and Claude. The top 10 URLs from the Google Search Console’s search engine results page (SERP) were retrieved for each query.

1 papers0 benchmarksTexts

PIAST (PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksAudio, Midi, Texts

MedTurkQuAD: Medical Turkish Question-Answering Dataset

A comprehensive Turkish dataset for question-answering tasks in medical domain

1 papers2 benchmarksTexts

Code and Data for Replication

Code and Data for Replication of "Microsimulation Estimates of Decision Uncertainty and Value of Information Are Biased but Consistent"

1 papers0 benchmarksTexts

Twitter job title prediction

We introduce a dataset consisting of 1314 samples, including users’ tweets and bios. The user’s job title is found using Wikipedia crawling. The challenge of multiple job titles per user is handled using a semantic word embedding and clustering method. Then, a job prediction method is introduced based on a deep neural network and TF-IDF word embedding. We also use hashtags and emojis in the tweets for job prediction. Results show that the job title of users in Twitter could be well predicted with 54% accuracy in nine categories.

1 papers0 benchmarksTables, Tabular, Texts

ArSen

Sentiment analysis is pivotal in Natural Language Processing for understanding opinions and emotions in text. While advancements in Sentiment analysis for English are notable, Arabic Sentiment Analysis (ASA) lags, despite the growing Arabic online user base. Existing ASA benchmarks are often outdated and lack comprehensive evaluation capabilities for state-of-the-art models. To bridge this gap, we introduce ArSen, a meticulously annotated COVID-19-themed Arabic dataset, and the IFDHN, a novel model incorporating fuzzy logic for enhanced sentiment classification. ArSen provides a contemporary, robust benchmark, and IFDHN achieves state-of-the-art performance on ASA tasks. Comprehensive evaluations demonstrate the efficacy of IFDHN using the ArSen dataset, highlighting future research directions in ASA.

1 papers0 benchmarksTexts

SoliDiffy Differencing Contract Pairs and Edit Scripts

SoliDiffy Differencing Contract Pairs and Edit Scripts Dataset The project creates and maintains two main datasets to assist with research and evaluation of Solidity smart contract differencing:

1 papers0 benchmarksTexts

GPTKB

GPTKB is a large general-domain knowledge base (KB) constructed entirely from a large language model (LLM). It demonstrates the feasibility of large-scale KB construction from LLMs, while highlighting specific challenges arising around entity recognition, entity and property canonicalization, and taxonomy construction.

1 papers0 benchmarksTexts

SIB bioinformatics SPARQL queries

A large collection of human-written natural language questions and their corresponding SPARQL queries over federated bioinformatics knowledge graphs (KGs) collected for several years across different research groups at the SIB Swiss Institute of Bioinformatics. The collection comprises more than 1000 example questions and queries, including 65 federated queries. We propose a methodology to uniformly represent the examples with minimal metadata, based on existing standards. Furthermore, we introduce an extensive set of open-source applications, including query graph visualizations and smart query editors, easily reusable by KG maintainers who adopt the proposed methodology.

1 papers0 benchmarksTexts

COMFORT (Consistent Multilingual Frame of Reference Test)

COMFORT is an evaluation protocol to systematically assess the spatial reasoning capabilities of VLMs.

1 papers0 benchmarksImages, Texts

NOVIC Caption-Object Data

This corpus contains data files that were generated as part of the NOVIC paper (see above). This includes the complete Object Noun Dictionary, the exact templates used for the multiset prompt templating strategy, and a large dataset of 1.8M LLM-generated and templated captions assorted by target noun. The captions were generated based on all of the target nouns in the Object Noun Dictionary.

1 papers0 benchmarksTexts

DAVIS-Edit

DAVIS-Edit is a curated testing benchmark for video editing. This dataset contains two evaluation settings, i.e., text- and image-based editing. Besides, it offers two types of annotated for both modalities of prompts, considering the editing scenarios with similar (DAVIS-Edit-S) and changing (DAVIS-Edit-C) shapes, so as to address the shape inconsistency problem in video-to-video editing.

1 papers0 benchmarksImages, Texts, Videos

Synthetic Product Desirability Datasets for Sentiment Analysis Testing

Overview: This collection contains three synthetic datasets produced by gpt-4o-mini for sentiment analysis and PDT (Product Desirability Toolkit) testing. Each dataset contains 1000 hypothetical software product reviews with the aim to produce a diversity of sentiment and text. The datasets were created as part of the research described in:

1 papers0 benchmarksTexts

MM-Eval (Modern Mongolian Evaluation)

Large language models (LLMs) excel in high-resource languages but face notable challenges in low-resource languages like Mongolian. The release of MM-Eval, comprising 569 syntax, 677 semantics, 344 knowledge, and 250 reasoning tasks, offers valuable insights for advancing NLP and LLMs in low-resource languages like Mongolian.

1 papers0 benchmarksTexts

AdvSuffixes (Adversarial Suffixes)

AdvSuffixes - Information AdvSuffixes is a curated dataset of adversarial prompts and suffixes designed to evaluate and enhance the robustness of large language models (LLMs) against adversarial attacks. By appending these suffixes to standard prompts, researchers and developers can explore and analyze how LLMs respond to potentially harmful input scenarios. This dataset is heavily inspired by AdvBench.

1 papers0 benchmarksTexts

approved_drug_target (Approved Drug SMILES and Protein Sequence Dataset)

This dataset provides a curated collection of approved drug Simplified Molecular Input Line Entry System (SMILES) strings and their associated protein sequences. Each small molecule has been approved by at least one regulatory body, ensuring the safety and relevance of the data for computational applications. The dataset includes 1,660 approved small molecules and their 2,093 related protein targets.

1 papers0 benchmarksTexts

PreviousPage 143 of 158Next

Datasets

MILU (Multi-task Indic Language Understanding Benchmark)

Google Ranked URLs Dataset

PIAST (PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text)

MedTurkQuAD: Medical Turkish Question-Answering Dataset

Code and Data for Replication

Twitter job title prediction

ArSen

SoliDiffy Differencing Contract Pairs and Edit Scripts

GPTKB

SIB bioinformatics SPARQL queries

COMFORT (Consistent Multilingual Frame of Reference Test)

NOVIC Caption-Object Data

DAVIS-Edit

Synthetic Product Desirability Datasets for Sentiment Analysis Testing

MM-Eval (Modern Mongolian Evaluation)

AdvSuffixes (Adversarial Suffixes)

approved_drug_target (Approved Drug SMILES and Protein Sequence Dataset)

Russian Sentences POS tagged

SumIPCC

Bengali Curated News Summary Dataset

Datasets

MILU (Multi-task Indic Language Understanding Benchmark)

Google Ranked URLs Dataset

PIAST (PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text)

MedTurkQuAD: Medical Turkish Question-Answering Dataset

Code and Data for Replication

Twitter job title prediction

ArSen

SoliDiffy Differencing Contract Pairs and Edit Scripts

GPTKB

SIB bioinformatics SPARQL queries

COMFORT (Consistent Multilingual Frame of Reference Test)

NOVIC Caption-Object Data

DAVIS-Edit

Synthetic Product Desirability Datasets for Sentiment Analysis Testing

MM-Eval (Modern Mongolian Evaluation)

AdvSuffixes (Adversarial Suffixes)

approved_drug_target (Approved Drug SMILES and Protein Sequence Dataset)

Russian Sentences POS tagged

SumIPCC

Bengali Curated News Summary Dataset