19,997 machine learning datasets
19,997 dataset results
This is a large-scale dataset of tweets associated to thousands of news articles published on Italian disinformation websites in the context of 2019 European elections.
The FieldSAFE dataset is a multi-modal dataset for obstacle detection in agriculture. It comprises 2 hours of raw sensor data from a tractor-mounted sensor system in a grass mowing scenario in Denmark, October 2016.
NELA-GT-2020 is an updated version of the NELA-GT-2019 dataset. NELA-GT-2020 contains nearly 1.8M news articles from 519 sources collected between January 1st, 2020 and December 31st, 2020. Just as with NELA-GT-2018 and NELA-GT-2019, these sources come from a wide range of mainstream news sources and alternative news sources. Included in the dataset are source-level ground truth labels from Media Bias/Fact Check (MBFC) covering multiple dimensions of veracity. Additionally, new in the 2020 dataset are the Tweets embedded in the collected news articles, adding an extra layer of information to the data.
The dataset contains 7000 videos: native, altered and exchanged through social platforms. The altered contents include manipulations with FFmpeg, AVIdemux, Kdenlive and Adobe Premiere. The social platforms used to exchange the native and altered videos are Facebook, Tiktok, Youtube and Weibo.
This is a 4D light-field dataset of materials. The dataset contains 12 material categories, each with 100 images taken with a Lytro Illum, from which we extract about 30,000 patches in total.
TAU Spatial Sound Events 2019 consists of 2 datasets: Ambisonic (FOA) and Microphone Array (MIC), of identical sound scenes with the only difference in the format of the audio. The FOA dataset provides four-channel First-Order Ambisonic recordings while the MIC dataset provides four-channel directional microphone recordings from a tetrahedral array configuration. Both formats are extracted from the same microphone array.
FINO-Net is a multimodal (RGB, depth and audio) dataset, containing 229 real-world manipulation data of 5 different manipulation types recorded with a Baxter robot.
A set of realistic odd-one-out stimuli gathered "in the wild". Each image in the Odd-One-Out (O3) dataset depicts a scene with multiple objects similar to each other in appearance (distractors) and a singleton (target) distinct in one or more feature dimensions (e.g. color, shape, size). All images are resized so that the larger dimension is 1024px. Targets represent approx. 400 common object types such as flowers, sweets, chicken eggs, leaves, tiles and birds. Pixelwise masks are provided for targets and distractors. Annotations are generated using CVAT.
This is a set of small programs with logic bombs. The logic bomb can be triggered when certain conditions are met. Any dynamic testing tools (especially symbolic execution) can employ the dataset to benchmark their capabilities.
Data set of 360-degree equirectangular videos, gaze recordings, eye movement (EM) ground-truth and an automatic EM classification algorithm.
Message Queuing Telemetry Transport (MQTT) protocol is one of the most used standards used in Internet of Things (IoT) machine to machine communication. The increase in the number of available IoT devices and used protocols reinforce the need for new and robust Intrusion Detection Systems (IDS). However, building IoT IDS requires the availability of datasets to process, train and evaluate these models.
The ukiyo-e faces dataset comprises of 5209 images of faces from ukiyo-e prints. The images are 1024x1024 pixels in jpeg format and have been aligned using the procedure used for the FFHQ dataset
DiagSet is a histopathological dataset for prostate cancer detection. The proposed dataset consists of over 2.6 million tissue patches extracted from 430 fully annotated scans, 4675 scans with assigned binary diagnosis, and 46 scans with diagnosis given independently by a group of histopathologists.
The ChineseLP dataset contains 411 vehicle images (mostly of passenger cars) with Chinese license plates (LPs). It consists of 252 images captured by the authors and 159 images downloaded from the internet. The images present great variations in resolution (from 143 × 107 to 2048 × 1536 pixels), illumination and background.
WHU-RS19 is a set of satellite images exported from Google Earth, which provides high-resolution satellite images up to 0.5 m. Some samples of the database are displayed in the following picture. It contains 19 classes of meaningful scenes in high-resolution satellite imagery, including airport, beach, bridge, commercial, desert, farmland, footballfield, forest, industrial, meadow, mountain, park, parking, pond, port, railwaystation, residential, river, and viaduct. For each class, there are about 50 samples. It’s worth noticing that the image samples of the same class are collected from different regions in satellite images of different resolutions and then might have different scales, orientations and illuminations.
SciDuet is a dataset for training and benchmarking models for automating document-to-slides generation. It consists of pairs of papers and their corresponding slides decks from recent years' NLP and ML conferences (e.g., ACL). This dataset contains 1,088 papers and 10,034 slides.
Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds Version: 2025.03.24.0 Content The following fruits, vegetables and nuts and are included: Apples (different varieties: Crimson Snow, Golden, Golden-Red, Granny Smith, Pink Lady, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Beans, Beetroot Red, Blackberry, Blueberry, Cabbage, Caju seed, Cactus fruit, Cantaloupe (2 varieties), Carambula, Carrot, Cauliflower, Cherimoya, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Corn (with husk), Cucumber (ripened, regular), Dates, Eggplant, Fig, Ginger Root, Goosberry, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kohlrabi, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango (Green, Red), Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine (Regular, Flat), Nut (Fores
The dataset consists of sets of movie titles, with each set annotated with a single English soft attribute (subjective descriptive property, such as 'confusing' or 'romantic') and a reference movie. For each set, a crowd worker has placed the movies into three sets: more, equally, and less than the reference movie. There are 5,991 such sets, from which one can infer approximately 250,000 pairwise preferences over movies for the 60 distinct soft attributes studied.
Deep neural networks for video based eye tracking have demonstrated resilience to noisy environments, stray reflections and low resolution. However, to train these networks, a large number of manually annotated images are required. To alleviate the cumbersome process of manual labeling, computer graphics rendering is employed to automatically generate a large corpus of annotated eye images under various conditions. In this work, we introduce RIT-Eyes, a novel synthetic eye image generation platform which improves upon previous work by adding features such as retinal retro-reflection, realistic blinks, an active deformable iris and an aspherical cornea. We add various external influences which potentially degrade eye tracking such as corrective eye-wear with varying refractive indices. To demonstrate the utility of RIT-Eyes, we generate and publicly share a large dataset of images with a variety of eye poses and viewing conditions.
Primary sclerosing cholangitis is an autoimmune disease leading to destruction of the small bile ducts in the liver. Progression is slow but inexhortable, eventually leading to cirrhosis and liver decompensation. The condition has been recognised since at least 1851 and was named "primary biliary cirrhosis" in 1949. Because cirrhosis is a feature only of advanced disease, a change of its name to "primary biliary cholangitis" was proposed by patient advocacy groups in 2014.