Datasets

3,148 machine learning datasets

3,148 dataset results

SurgeGlobal/Evol-Instruct

Dataset Generation

SSv2-Spatio-Temporal (Something Someting v2-Spatio-Temporal)

We use Something-Something v2 dataset to obtain the generation prompts and ground truth masks from real action videos. We filter out a set of 295 prompts. The details for this filtering are in the "Peekaboo: Interactive Video Generation via Masked-Diffusion" paper. We then use an off-the-shelf OWL-ViT-large open-vocabulary object detector to obtain the bounding box (bbox) annotations of the object in the videos. This set represents bbox and prompt pairs of real-world videos, serving as a test bed for both the quality and control of methods for generating realistic videos with spatio-temporal control.

1 papers0 benchmarksInteractive, Texts, Tracking, Videos

DocRED-IE

The DocRED Information Extraction (DocRED-IE) dataset extends the DocRED dataset for the Document-level Closed Information Extraction (DocIE) task. DocRED-IE is a multi-task dataset and allows for 5 subtasks: (i) Document-level Relation Extraction, (ii) Mention Detection, (iii) Entity Typing, (iv) Entity Disambiguation, (v) Coreference Resolution, as well as combinations thereof such as Named Entity Recognition (NER) or Entity Linking. The DocRED-IE dataset also allows for the end-to-end tasks of: (i) DocIE and (ii) Joint Entity and Relation Extraction. DocRED-IE comprises sentence-level and document-level facts, thereby describing short as well as long-range interactions within an entire document.

1 papers6 benchmarksTexts

Replication Package: Migrating Software Systems towards Post-Quantum-Cryptography - A Systematic Literature Review

This is the replication package for our systematic literature review and can be used for the reproducibility of the individual steps of our search and selection methodology.

1 papers0 benchmarksTexts

Social-IQ 2.0

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksAudio, Texts, Videos

PECC (PECC: Problem Extraction and Coding Challenges)

Recent advancements in large language models (LLMs) have showcased their exceptional abilities across various tasks, such as code generation, problem-solving and reasoning. Existing benchmarks evaluate tasks in isolation, yet the extent to which LLMs can understand prose-style tasks, identify the underlying problems, and then generate appropriate code solutions is still unexplored. Addressing this gap, we introduce PECC, a novel benchmark derived from Advent Of Code (AoC) challenges and Project Euler, including 2396 problems. Unlike conventional benchmarks, PECC requires LLMs to interpret narrative-embedded problems, extract requirements, and generate executable code. A key feature of our dataset is the complexity added by natural language prompting in chat-based evaluations, mirroring real-world instruction ambiguities. Results show varying model performance between narrative and neutral problems, with specific challenges in the Euler math-based subset with GPT-3.5-Turbo passing 50% o

1 papers1 benchmarksTexts

ViTHSD (Vietnamese Targeted-Hate-Speech-Detection)

A Vietnamese dataset for hate speech detection by the specific target. The dataset contains 10,000 comments, each comment has 05 targets with three relevant hateful levels.

1 papers0 benchmarksTexts

Labels

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksTexts

IDD-X

Intelligent vehicle systems require a deep understanding of the interplay between road conditions, surrounding entities, and the ego vehicle's driving behavior for explainable driving decision-making and safe and efficient navigation. This is particularly critical in developing countries where traffic situations are often dense and unstructured with heterogeneous road occupants. Existing datasets, predominantly geared towards structured and sparse traffic scenarios, fall short of capturing the complexity of driving in such environments. To fill this gap, we present IDD-X, a large-scale dual-view driving video dataset. With 697K bounding boxes, 9K important object tracks, and 1-12 objects per video, IDD-X offers comprehensive ego-relative annotations for multiple important road objects covering 10 categories and 19 explanation label categories. The dataset also incorporates rearview information to provide a more complete representation of the driving environment. We also introduce custo

1 papers0 benchmarksTexts, Videos

ProCIS

A large-scale dataset for proactive document retrieval that consists of over 2.8 million conversations from Reddit.

1 papers0 benchmarksTexts

SoccerNet-Echoes (SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset)

SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset.

1 papers0 benchmarksAudio, Texts, Videos

CinePile: A Long Video Question Answering Dataset and Benchmark

CinePile is a question-answering-based, long-form video understanding dataset. It has been created using advanced large language models (LLMs) with human-in-the-loop pipeline leveraging existing human-generated raw data. It consists of approximately 300,000 training data points and 5,000 test data points.

1 papers2 benchmarksTexts, Videos

iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023

Online web communities often face bans for violating platform policies, encouraging their migration to alternative platforms. This migration, however, can result in increased toxicity and unforeseen consequences on the new platform. In recent years, researchers have collected data from many alternative platforms, indicating coordinated efforts leading to offline events, conspiracy movements, hate speech propagation, and harassment. Thus, it becomes crucial to characterize and understand these alternative platforms. To advance research in this direction, we collect and release a large-scale dataset from Scored -- an alternative Reddit platform that sheltered banned fringe communities, for example, c/TheDonald (a prominent right-wing community) and c/GreatAwakening (a conspiratorial community). Over four years, we collected approximately 57M posts from Scored, with at least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception.

1 papers0 benchmarksTexts

IoTvulCode

The dataset includes source code vulnerabilities in some of the most commonly used IoT frameworks. We introduce IoTvulCode- a novel framework consisting of a dataset-generating tool, and ML-enabled methods for the detection of source code vulnerabilities and weaknesses as well as the initial release of an IoT vulnerability dataset. Our framework contributes to improving the existing coding practices, leading to a more secure IoT infrastructure.

1 papers0 benchmarksTexts

AlpacaEval-TH

AlpacaEval in Thai.

1 papers0 benchmarksTexts

MT-Bench-TH

MT-Bench in Thai.

1 papers0 benchmarksTexts

MSNER (Multilingual Spoken Named Entity Recognition)

This dataset contains named entities annotations for European Parliament recordings in Dutch, French, German and Spanish. The entity annotation scheme follows OntoNotes v5. The original unannotated dataset is VoxPopuli.

1 papers0 benchmarksSpeech, Texts

PaRoutes

We introduce a framework for benchmarking multi-step retrosynthesis methods, i.e. route predictions, called PaRoutes. The framework consists of two sets of 10 000 synthetic routes extracted from the patent literature, a list of stock compounds, and a curated set of reactions on which one-step retrosynthesis models can be trained

1 papers0 benchmarksTexts

Reglamento_Aeronautico_Colombiano_2024

Dataset Details Total Labeled: 100%

1 papers0 benchmarksTexts

RTE3-FR

RTE3-FR dataset is the French translation of the Textual Entailment English dataset used in the RTE-3 Challenge (https://nlp.stanford.edu/RTE3-pilot).

1 papers0 benchmarksTexts

PreviousPage 135 of 158Next