Datasets

19,997 machine learning datasets

19,997 dataset results

dichasus-cf0x (CSI Dataset dichasus-cf0x: Distributed Antenna Setup in Industrial Environment, Day 1)

Dataset containing channel state information (CSI) alongside ground truth data (position tags, timestamps) of a massive MIMO-OFDM system measured with the DICHASUS channel sounder. Measurement parameters and machine-readable file format descriptions are provided in a JSON file (spec.json).

2 papers0 benchmarks

RIR dataset (Planar Room Impulse Response Dataset - ACT, DTU Electro (b. 355 r. 008))

Dataset of Room Impulse Responses measured at the Acoustic Technology group facilities, DTU Electro. The measurements were carried out in building 355, room 008, otherwise known as the "sound field control" room.

2 papers0 benchmarksAudio

Open Radar Datasets (Open Radar Datasets: Outdoor Moving Object Dataset)

A classification dataset of radar spectrograms in i "ground surveillance" setting recorded with the Open Radar Initiative. A dataset in a "ground surveillance" setting. The dataset has been collected with a stationary radar and targets moving in front of the radar. The dataset has been collected using both collaborative and non-collaborative targets.

2 papers0 benchmarks

LogiEval

The LogiEval dataset is a benchmark suite designed for evaluating the logical reasoning abilities of prompt-based language models, particularly instruct-prompt large language models. Here are some key details about LogiEval:

2 papers0 benchmarks

SKSF-A

SKSF-A consists of seven distinct styles drawn by professional artists. SKSF-A contains 134 identities and corresponding sketches, making a total of 938 face-sketch pairs. SKSF-A is introduced in StyleSketch, Eurographics 2024. https://kwanyun.github.io/stylesketch_project/

2 papers6 benchmarksImages

WetLinks (WetLinks: a Large-Scale Longitudinal Starlink Dataset with Contiguous Weather Data)

WetLinks: a Large-Scale Longitudinal Starlink Dataset with Contiguous Weather Data. This data set includes stationary measurements of Starlink setups recorded over several months at two sites in Central Europe. The measurements sites are in Osnabrück (GER) and Enschede (NL). The throughput measurements were conducted UDP based. The dataset also contains high quality weather data, collected directly on the measurement site. See the paper for details.

2 papers0 benchmarksTime series

AG-ReID.v2 (Aerial-Ground Person Re-identification)

Aerial-ground person re-identification (Re-ID) presents unique challenges in computer vision, stemming from the distinct differences in viewpoints, poses, and resolutions between high-altitude aerial and ground-based cameras. Existing research predominantly focuses on ground-to-ground matching, with aerial matching less explored due to a dearth of comprehensive datasets. To address this, we introduce AG-ReID.v2, a dataset specifically designed for person Re-ID in mixed aerial and ground scenarios. This dataset comprises 100,502 images of 1,615 unique individuals, each annotated with matching IDs and 15 soft attribute labels. Data were collected from diverse perspectives using a UAV, stationary CCTV, and smart glasses-integrated camera, providing a rich variety of intra-identity variations.

2 papers1 benchmarks

DISL (Fueling Research with A Large Dataset of Solidity Smart Contracts)

DISL The full dataset report is available at: https://arxiv.org/abs/2403.16861

2 papers0 benchmarksTexts

M2KR (Multi-task Multi-modal Knowledge Retrieval)

The M2KR is a collection of datasets designed for training and evaluating general-purpose vision-language retrievers. These datasets are released in Huggingface Dataset format and cover various retrieval tasks. Let's delve into the details:

2 papers0 benchmarks

Stance Detection in COVID-19 Tweets

The "Stance Detection in COVID-19 Tweets" dataset represents an evolution of stance detection research, tailored to address the unique and urgent challenges presented by the COVID-19 pandemic. This dataset is designed to capture public opinions, beliefs, and sentiments towards various aspects of the COVID-19 crisis, such as government policies, vaccination campaigns, public health recommendations, and the impact of the virus on daily life. It facilitates the analysis of how people's stances on these issues are expressed in social media discourse, specifically through tweets.

2 papers0 benchmarks

GOTCHA

We release the dataset for non-commercial research. Submit requests <a href="https://forms.gle/6WPEGNWbYoEe6bte8" target="_blank">here</a>.

2 papers0 benchmarksImages, Speech, Videos

VQDv1 (Visual Query Detection v1)

In Visual Query Detection (VQD), a system is given a query (prompt) natural language and an image, and then the system must produce 0 - N boxes that satisfy that query. VQD is related to several other tasks in computer vision, but it captures abilities these other tasks ignore. Unlike object detection, VQD can deal with attributes and relations among objects in the scene. In VQA, often algorithms produce the right answers due to dataset bias without `looking' at relevant image regions. Referring Expression Recognition (RER) datasets have short and often ambiguous prompts, and by having only a single box as an output, they make it easier to exploit dataset biases. VQD requires goal-directed object detection and outputting a variable number of boxes that answer a query.

2 papers0 benchmarksImages, Texts

OWT2 (OpenWebtext2)

OpenWebText2 is an enhanced version of the original OpenWebTextCorpus. It encompasses all Reddit submissions from 2005 up until April 2020, with additional months becoming available after the corresponding PushShift dump files are released¹²³. Here are the key details:

2 papers0 benchmarks

NQiI (Natural Questions In Icelandic)

Natural Questions in Icelandic (NQiI) is a valuable dataset designed for extractive question answering (QA) in the Icelandic language. Let me provide you with some details about this dataset:

2 papers0 benchmarks

PortraitMode-400

The PortraitMode-400 dataset is a significant contribution to the field of video recognition, specifically focusing on portrait mode videos. Let me provide you with more details:

2 papers0 benchmarks

MaSaC_ERC

The E-MASAC Dataset is a collection of code-mixed conversations sourced from an Indian TV series, focusing on Hindi-English interactions. It was derived from the MASAC dataset and specifically annotated for Emotion Recognition in Conversations (ERC) tasks. The dataset comprises 8,607 dialogues with 11,440 utterances, containing instances of sarcasm and humor. Emotions such as anger, fear, joy, sadness, surprise, contempt, and neutral are annotated for each utterance by three fluent English and Hindi-speaking linguists, ensuring a high inter-annotator agreement of 0.85.

2 papers1 benchmarksTexts

AVMIT (Audiovisual Moments in Time)

Audiovisual Moments in Time (AVMIT) is a large-scale dataset of audiovisual action events. The dataset includes the annotation of 57,177 audiovisual videos from the Moments in Time dataset, each independently evaluated by 3 of 11 trained participants. Each annotation pertains to whether the labelled audiovisual action event is present and whether it is the most prominent feature of the video. The dataset also provides a curated test set of 960 videos across 16 classes, suitable for comparative experiments involving computational models and human participants, specifically when addressing research questions where audiovisual correspondence is of critical importance.

2 papers0 benchmarksVideos

OAD dataset (The Online Action Detection Dataset)

The Online Action Detection Dataset (OAD) was captured using the Kinect V2 sensor, which collects color images, depth images and human skeleton joints synchronously. This dataset includes 59 long sequences and 10 actions.

2 papers3 benchmarks3D, Images

UADFV

Deepfake Dataset

2 papers0 benchmarks

HERA RFI Detection (Hydrogen Epoch of Reionization Array (HERA))

This dataset contains simulated and expert-labelled spectrograms from two radio telescopes: the Hydrogen Epoch of Reionization Array (HERA) in South Africa and the Low-Frequency Array (LOFAR) in the Netherlands. These datasets are intended to test radio-frequency interference (RFI) detection schemes. This entry pertains to the HERA dataset specifically.

2 papers6 benchmarksImages

PreviousPage 346 of 1000Next