TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

Italian disinformation

This is a large-scale dataset of tweets associated to thousands of news articles published on Italian disinformation websites in the context of 2019 European elections.

4 papers0 benchmarksTexts

FieldSAFE

The FieldSAFE dataset is a multi-modal dataset for obstacle detection in agriculture. It comprises 2 hours of raw sensor data from a tractor-mounted sensor system in a grass mowing scenario in Denmark, October 2016.

4 papers0 benchmarksImages, LiDAR

NELA-GT-2020

NELA-GT-2020 is an updated version of the NELA-GT-2019 dataset. NELA-GT-2020 contains nearly 1.8M news articles from 519 sources collected between January 1st, 2020 and December 31st, 2020. Just as with NELA-GT-2018 and NELA-GT-2019, these sources come from a wide range of mainstream news sources and alternative news sources. Included in the dataset are source-level ground truth labels from Media Bias/Fact Check (MBFC) covering multiple dimensions of veracity. Additionally, new in the 2020 dataset are the Tweets embedded in the collected news articles, adding an extra layer of information to the data.

4 papers0 benchmarksTexts

EVA (EVA-7K dataset)

The dataset contains 7000 videos: native, altered and exchanged through social platforms. The altered contents include manipulations with FFmpeg, AVIdemux, Kdenlive and Adobe Premiere. The social platforms used to exchange the native and altered videos are Facebook, Tiktok, Youtube and Weibo.

4 papers0 benchmarksVideos

Light-Field Material

This is a 4D light-field dataset of materials. The dataset contains 12 material categories, each with 100 images taken with a Lytro Illum, from which we extract about 30,000 patches in total.

4 papers0 benchmarksImages

TAU Spatial Sound Events 2019

TAU Spatial Sound Events 2019 consists of 2 datasets: Ambisonic (FOA) and Microphone Array (MIC), of identical sound scenes with the only difference in the format of the audio. The FOA dataset provides four-channel First-Order Ambisonic recordings while the MIC dataset provides four-channel directional microphone recordings from a tetrahedral array configuration. Both formats are extracted from the same microphone array.

4 papers0 benchmarksAudio

FINO-Net

FINO-Net is a multimodal (RGB, depth and audio) dataset, containing 229 real-world manipulation data of 5 different manipulation types recorded with a Baxter robot.

4 papers0 benchmarksAudio, RGB-D

O3 (Odd-One-Out Dataset)

A set of realistic odd-one-out stimuli gathered "in the wild". Each image in the Odd-One-Out (O3) dataset depicts a scene with multiple objects similar to each other in appearance (distractors) and a singleton (target) distinct in one or more feature dimensions (e.g. color, shape, size). All images are resized so that the larger dimension is 1024px. Targets represent approx. 400 common object types such as flowers, sweets, chicken eggs, leaves, tiles and birds. Pixelwise masks are provided for targets and distractors. Annotations are generated using CVAT.

4 papers0 benchmarksImages

Logic Bombs

This is a set of small programs with logic bombs. The logic bomb can be triggered when certain conditions are met. Any dynamic testing tools (especially symbolic execution) can employ the dataset to benchmark their capabilities.

4 papers0 benchmarks

360 EM

Data set of 360-degree equirectangular videos, gaze recordings, eye movement (EM) ground-truth and an automatic EM classification algorithm.

4 papers0 benchmarks

MQTT-IoT-IDS2020

Message Queuing Telemetry Transport (MQTT) protocol is one of the most used standards used in Internet of Things (IoT) machine to machine communication. The increase in the number of available IoT devices and used protocols reinforce the need for new and robust Intrusion Detection Systems (IDS). However, building IoT IDS requires the availability of datasets to process, train and evaluate these models.

4 papers0 benchmarks

Ukiyo-e Faces

The ukiyo-e faces dataset comprises of 5209 images of faces from ukiyo-e prints. The images are 1024x1024 pixels in jpeg format and have been aligned using the procedure used for the FFHQ dataset

4 papers0 benchmarksImages

DiagSet

DiagSet is a histopathological dataset for prostate cancer detection. The proposed dataset consists of over 2.6 million tissue patches extracted from 430 fully annotated scans, 4675 scans with assigned binary diagnosis, and 46 scans with diagnosis given independently by a group of histopathologists.

4 papers0 benchmarksImages, Medical

ChineseLP

The ChineseLP dataset contains 411 vehicle images (mostly of passenger cars) with Chinese license plates (LPs). It consists of 252 images captured by the authors and 159 images downloaded from the internet. The images present great variations in resolution (from 143 × 107 to 2048 × 1536 pixels), illumination and background.

4 papers1 benchmarksImages

WHU-RS19

WHU-RS19 is a set of satellite images exported from Google Earth, which provides high-resolution satellite images up to 0.5 m. Some samples of the database are displayed in the following picture. It contains 19 classes of meaningful scenes in high-resolution satellite imagery, including airport, beach, bridge, commercial, desert, farmland, footballfield, forest, industrial, meadow, mountain, park, parking, pond, port, railwaystation, residential, river, and viaduct. For each class, there are about 50 samples. It’s worth noticing that the image samples of the same class are collected from different regions in satellite images of different resolutions and then might have different scales, orientations and illuminations.

4 papers0 benchmarksImages

SciDuet

SciDuet is a dataset for training and benchmarking models for automating document-to-slides generation. It consists of pairs of papers and their corresponding slides decks from recent years' NLP and ML conferences (e.g., ACL). This dataset contains 1,088 papers and 10,034 slides.

4 papers0 benchmarksTexts

Fruits 360 (A dataset of images containing fruits, vegetables, nuts and seeds)

Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds Version: 2025.03.24.0 Content The following fruits, vegetables and nuts and are included: Apples (different varieties: Crimson Snow, Golden, Golden-Red, Granny Smith, Pink Lady, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Beans, Beetroot Red, Blackberry, Blueberry, Cabbage, Caju seed, Cactus fruit, Cantaloupe (2 varieties), Carambula, Carrot, Cauliflower, Cherimoya, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Corn (with husk), Cucumber (ripened, regular), Dates, Eggplant, Fig, Ginger Root, Goosberry, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kohlrabi, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango (Green, Red), Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine (Regular, Flat), Nut (Fores

4 papers0 benchmarksImages

SoftAttributes (SoftAttributes: Relative movie attribute dataset for soft attributes)

The dataset consists of sets of movie titles, with each set annotated with a single English soft attribute (subjective descriptive property, such as 'confusing' or 'romantic') and a reference movie. For each set, a crowd worker has placed the movies into three sets: more, equally, and less than the reference movie. There are 5,991 such sets, from which one can infer approximately 250,000 pairwise preferences over movies for the 60 distinct soft attributes studied.

4 papers0 benchmarksTexts

RITEyes

Deep neural networks for video based eye tracking have demonstrated resilience to noisy environments, stray reflections and low resolution. However, to train these networks, a large number of manually annotated images are required. To alleviate the cumbersome process of manual labeling, computer graphics rendering is employed to automatically generate a large corpus of annotated eye images under various conditions. In this work, we introduce RIT-Eyes, a novel synthetic eye image generation platform which improves upon previous work by adding features such as retinal retro-reflection, realistic blinks, an active deformable iris and an aspherical cornea. We add various external influences which potentially degrade eye tracking such as corrective eye-wear with varying refractive indices. To demonstrate the utility of RIT-Eyes, we generate and publicly share a large dataset of images with a variety of eye poses and viewing conditions.

4 papers0 benchmarks

PBC (Mayo Clinic Primary Biliary Cholangitis data)

Primary sclerosing cholangitis is an autoimmune disease leading to destruction of the small bile ducts in the liver. Progression is slow but inexhortable, eventually leading to cirrhosis and liver decompensation. The condition has been recognised since at least 1851 and was named "primary biliary cirrhosis" in 1949. Because cirrhosis is a feature only of advanced disease, a change of its name to "primary biliary cholangitis" was proposed by patient advocacy groups in 2014.

4 papers0 benchmarks
PreviousPage 238 of 1000Next