TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

ullava

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

4 papers0 benchmarks

CommonMT

The CommonMT is a dataset used for evaluating commonsense reasoning in neural machine translation. This dataset contains three types of test suites: - Lexical ambiguity - Contextless syntactic ambiguity - Contextual syntactic ambiguity

4 papers0 benchmarks

Wino-X

The Wino-X dataset is a multilingual collection of Winograd Schemas. It was introduced as a tool for evaluating coreference resolution (CoR) and commonsense reasoning (CSR) capabilities of computational models. The dataset contains schemas in German, French, and Russian, aligned with their English counterparts.

4 papers0 benchmarks

ScaLA

The ScaLA dataset is a linguistic acceptability dataset for the Scandinavian languages, including Danish, Norwegian Bokmål, Norwegian Nynorsk, Swedish, Icelandic, and Faroese. It was developed as part of the ScandEval benchmarking platform and consists of sentences in these languages that are either grammatically correct or incorrect. The dataset is designed to evaluate the ability of language models to distinguish between grammatically correct and incorrect sentences in the Scandinavian languages. It is one of the contributions of the ScandEval project, aiming to advance the state of natural language processing in the Scandinavian languages.

4 papers0 benchmarks

arXiv

For nearly 30 years, ArXiv has served the public and research communities by providing open access to scholarly articles, from the vast branches of physics to the many subdisciplines of computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics. This rich corpus of information offers significant, but sometimes overwhelming depth.

4 papers3 benchmarks

8TAGS

The 8TAGS dataset is a corpus specifically created for the evaluation of sentence representations in Polish. It consists of approximately 50,000 sentences annotated with eight topic labels, including film, history, food, medicine, motorization, work, sport, and technology. The dataset was automatically generated by extracting sentences from headlines and short descriptions of articles posted on the Polish social networking site wykop.pl. The corpus contains cleaned and tokenized, unambiguous sentences, each tagged with only one of the selected categories and longer than 30 characters. The classification accuracy is reported for this dataset as a part of the evaluation of sentence representations in Polish.

4 papers0 benchmarks

KnowEdit

This is the dataset for knowledge editing. It contains six tasks: ZsRE, $Wiki_{recent}$, $Wiki_{counterfact}$, WikiBio, ConvSent and Sanitation. This repo shows the former 4 tasks and you can get the data for ConvSent and Sanitation from their original papers.

4 papers0 benchmarksTexts

Kitsune Network Attack Dataset

Kitsune Network Attack Dataset This is a collection of nine network attack datasets captured from a either an IP-based commercial surveillance system or a network full of IoT devices. Each dataset contains millions of network packets and diffrent cyber attack within it.

4 papers0 benchmarksImages

EDGE-IIOTSET (A NEW COMPREHENSIVE REALISTIC CYBER SECURITY DATASET OF IOT AND IIOT APPLICATIONS: CENTRALIZED AND FEDERATED LEARNING)

ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Wate

4 papers0 benchmarks

SZ-Taxi (Shenzhen Taxi Speed)

Taxi speed data in 15min interval from 156 sensors on major roads of Luohu District in Shenzhen, China, from Jan. 1 to Jan. 31, 2015.

4 papers4 benchmarks

Clothing Attributes Dataset

We introduce the Clothing Attribute Dataset for promoting research in learning visual attributes for objects. The dataset contains 1856 images, with 26 ground truth clothing attributes such as "long-sleeves", "has collar", and "striped pattern". The labels were collected using Amazon Mechanical Turk.

4 papers1 benchmarks

Historical Color Image Dataset

The historical color image dataset is collected for the task of automatically estimating the age of historical color photos. Each image is annotated with its associated decade, where five decades from the 1930s to 1970s are considered. There are 265 images for each category

4 papers0 benchmarks

MetaHate

MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection This is MetaHate: a meta-collection of 36 hate speech datasets from social media comments.

4 papers0 benchmarksTexts

CHOCOLATE (Captions Have Often ChOsen Lies About The Evidence)

CHOCOLATE is a benchmark for detecting and correcting factual inconsistency in generated chart captions. It consists of captions produced by six advanced models, which are categorized into three subsets:

4 papers1 benchmarksImages, Texts

CIDAR

CIDAR contains 10,000 instructions and their output. The dataset was created by selecting around 9,109 samples from Alpagasus dataset then translating it to Arabic using ChatGPT. In addition, we append that with around 891 Arabic grammar instructions from the webiste Ask the teacher. All the 10,000 samples were reviewed by around 12 reviewers.

4 papers0 benchmarks

P2S (Points2Surf)

We introduced this dataset in Points2Surf, a method that turns point clouds into meshes.

4 papers0 benchmarks3d meshes, Point cloud

3D MM-Vet

We established a 3D evaluation benchmark, 3D MM-Vet, to assess the 4-level capacity in embodied interaction scenarios, varying from basic perception to control statements generation.

4 papers1 benchmarks3D, Point cloud

CAPTURE-24

Large-scale human activity recognition dataset in free-living environment for 151 participants.

4 papers0 benchmarks

Multi-Label Classification Dataset Repository

For each dataset we provide a short description as well as some characterization metrics. It includes the number of instances (m), number of attributes (d), number of labels (q), cardinality (Card), density (Dens), diversity (Div), average Imbalance Ratio per label (avgIR), ratio of unconditionally dependent label pairs by chi-square test (rDep) and complexity, defined as m × q × d as in [Read 2010]. Cardinality measures the average number of labels associated with each instance, and density is defined as cardinality divided by the number of labels. Diversity represents the percentage of labelsets present in the dataset divided by the number of possible labelsets. The avgIR measures the average degree of imbalance of all labels, the greater avgIR, the greater the imbalance of the dataset. Finally, rDep measures the proportion of pairs of labels that are dependent at 99% confidence. A broader description of all the characterization metrics and the used partition methods are described in

4 papers0 benchmarksAudio, Biology, Images, Medical, Music, Texts, Videos

CivRealm

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

4 papers0 benchmarksEnvironment
PreviousPage 252 of 1000Next