Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,148 machine learning datasets

Filter by Modality

3,148 dataset results

Omnicount-191

To effectively evaluate OmniCount across open-vocabulary, supervised, and few-shot counting tasks, a dataset catering to a broad spectrum of visual categories and instances featuring various visual categories with multiple instances and classes per image is essential. The current datasets, primarily designed for object counting focusing on singular object categories like humans and vehicles, fall short for multi-label object counting tasks. Despite the presence of multi-class datasets like MS COCO, their utility is limited for counting due to the sparse nature of object appearance. Addressing this gap, we created a new dataset with 30,230 images spanning 191 diverse categories, including kitchen utensils, office supplies, vehicles, and animals. This dataset, featuring a wide range of object instance counts per image ranging from 1 to 160 and an average count of 10, bridges the existing void and establishes a benchmark for assessing counting models in varied scenarios.

1 papers1 benchmarksImages, Texts

MPII Human Pose Descriptions

The MPII Human Pose Descriptions dataset extends the widely-used MPII Human Pose Dataset with rich textual annotations. These annotations are generated by various state-of-the-art language models (LLMs) and include detailed descriptions of the activities being performed, the count of people present, and their specific poses.

1 papers0 benchmarksTexts

WikiFactDiff

WikiFactDiff is a dataset designed as a resource to perform atomic factual knowledge updates on language models, with the goal of aligning them with current knowledge. It describes the evolution of factual knowledge between two dates, named T_old and T_new, in the form of semantic triples. To enable the possibility of evaluating knowledge algorithms (such as ROME, MEND, MEMIT, etc.), these triples are verbalized and neighbor facts are determined to check for eventual bleedover.

1 papers0 benchmarksTexts

FABSA (An aspect-based sentiment analysis dataset of Customer Feedback reviews)

FABSA, An aspect-based sentiment analysis dataset in the Customer Feedback space (Trustpilot, Google Play and Apple Store reviews).

1 papers2 benchmarksTexts

RU22Fact

Multilingual explainable fact-checking dataset on Russia-Ukraine Conflict 2022

1 papers0 benchmarksTexts

LAMEN Transcripts

Raw negotiation transcripts generated for the paper "Evaluating Language Model Agency through Negotiations". The data includes transcripts from self-play (a model plays against an independent version of itself; corresponding to Section 4.1 of the paper) and cross-play (a model plays against another model; Section 4.2). This dataset encompasses 2926 transcripts (942 self-play, 1984 cross-play).

1 papers0 benchmarksTexts

Chem-FINESE

The dataset contains two few-shot chemical fine-grained entity extraction datasets, based on human-annotated ChemNER+ and CHEMET. For each dataset, we randomly sample a subset based on the frequency of each type class. Specifically, given a dataset, we first set the number of maximum entity mentions $k$ for the most frequent entity type in the dataset. We then randomly sample other types and ensure that the distribution of each type remains the same as in the original dataset. We choose the values $6, 9, 12, 15, 18$ as the potential maximum entity mentions for $k$. The ChemNER+ and CHEMET few-shot datasets contain 52 and 28 types respectively.

1 papers0 benchmarksTexts

Food Recall Incidents Dataset

The Food Recall Incidents dataset consists of 7,546 short texts (from 5 to 360 characters each), which are the titles of food recall announcements (therefore referred to as title), crawled from 24 public food safety authority websites by Agroknow. The texts are written in 6 languages, with English (6,644) and German (888) being the most common, followed by French (8), Greek (4), Italian (1) and Danish (1). Most of the texts have been authored after 2010 and they describe recalls of specific food products due to specific hazards. Experts manually classified each text to four groups of classes describing hazards and products on two levels of granularity:

1 papers0 benchmarksTexts

ChatGPT Role-Play Dataset (CRD)

Dataset Overview vanilla.csv: Represents the interactions without specific role-play instructions. boss.csv: Interactions where ChatGPT plays the role of a user's boss. classmate.csv: Interactions with ChatGPT acting as the user's classmate. Each turn was coded with user motives of user responses, or the perceived naturalness of ChatGPT responses.

1 papers0 benchmarksTexts

DOTA 2 toxic chat data

The dataset was collected from DOTA 2 using OpenDota API via Python. The collection consists of DOTA 2 in-game chat data that were manually categorized into 3 classifications: non-toxic, mild (toxicity), and toxic chats.

1 papers0 benchmarksTexts

ConQA (Conceptual Query Answering)

ConQA is a dataset created using the intersection between VisualGenome and MS-COCO. The goal of this dataset is to provide a new benchmark for text to image retrieval using short and less descriptive queries than the commonly use captions from MS-COCO or Flicker. ConQA consists of 80 queries divided into 50 conceptual and 30 descriptive queries. A descriptive query mentions some of the objects in the image, for instance, people chopping vegetables. While, a conceptual query does not mention objects or only refers to objects in a general context, e.g., working class life.

1 papers0 benchmarksImages, Texts

MerRec (MerRec Recommendation Dataset)

A large scale, C2C marketplace e-commerce dataset.

1 papers0 benchmarksTexts

Movie Reviews (Movie Review Polarity Dataset Enriched with "Annotator Rationales")

This dataset is based on the movie review polarity dataset (v2.0) collected and maintained by Bo Pang and Lillian Lee. Their dataset (we'll call it PL2.0) consists of 1000 positive and 1000 negative movie reviews obtained from the Internet Movie Database (IMDb) review archive.

1 papers0 benchmarksTexts

Tweet Sentiment Extraction (Sentiment Analysis: Emotion in Text tweets with existing sentiment labels)

"My ridiculous dog is amazing." [sentiment: positive]

1 papers0 benchmarksTexts

VlogQA (Vietnamese Spoken-Based Machine Reading Comprehension)

The VlogQA consists of 10,076 question-answer pairs based on 1,230 transcript documents sourced from YouTube - an extensive source of user-uploaded content, covering the topics of food and travel in the Vietnamese language. This dataset is used for research in Vietnamese Spoken-Based Machine Reading Comprehension.

1 papers0 benchmarksTexts

BEAR-probe (Benchmark for Evaluating Associative Reasoning)

The $\text{BEAR}$ dataset and its larger version, $\text{BEAR}_{\text{big}}$, are benchmarks for evaluating common factual knowledge contained in language models.

1 papers0 benchmarksTexts

SOMD (SOftware Mention Detection)

The dataset contains the training and test data for the SOftware Mention Detection challenge. The data is derived from the SoMeSci Knowledge Graph of software mentions.

1 papers0 benchmarksTexts

fake (Real / Fake Job Posting Prediction)

[Real or Fake] : Fake Job Description Prediction This dataset contains 18K job descriptions out of which about 800 are fake. The data consists of both textual information and meta-information about the jobs. The dataset can be used to create classification models which can learn the job descriptions which are fraudulent.

1 papers1 benchmarksTabular, Texts

SurgeGlobal/LaMini

Overview The LaMini Dataset is an instruction dataset generated using h2ogpt-gm-oasst1-en-2048-falcon-40b-v2. It is designed for instruction-tuning pre-trained models to specialize them in a variety of downstream tasks.

1 papers0 benchmarksTexts

SurgeGlobal/Orca

Dataset Generation

1 papers0 benchmarksTexts

PreviousPage 134 of 158Next