19,997 machine learning datasets
19,997 dataset results
Extended Agriculture-Vision dataset comprises two parts:
"We built a large lung CT scan dataset for COVID-19 by curating data from 7 public datasets listed in the acknowledgements. These datasets have been publicly used in COVID-19 diagnosis literature and proven their efficiency in deep learning applications. Therefore, the merged dataset is expected to improve the generalization ability of deep learning methods by learning from all these resources together.
The PKU dataset has almost 4,000 images categorized into five groups (G1-G5) that show different situations. For example, G1 has images of highways during the day with only one car in them. On the other hand, G5 has images of crosswalks during the day or at night with multiple cars and license plates (LPs).
RADIOML 2018.01A is a dataset which includes both synthetic simulated channel effects of 24 digital and analog modulation types which has been validated.
SuHiFiMask (Surveillance High-Fidelity Mask) extends FAS to real surveillance scenes rather than mimicking low-resolution images and surveillance environments. It contains 10,195 videos from 101 subjects of different age groups, which are collected by 7 mainstream surveillance cameras.
Contains Wikipedia pages about popular mathematics topics and edges describe the links from one page to another. Features describe the number of daily visits between 2019 and 2021 March.
It contains the dataset of class comments extracted from various projects of three programming languages Java, Pharo, and Python
This dataset is recreated using offline augmentation from the original dataset. The original dataset can be found on this github repo. This dataset consists of about 87K rgb images of healthy and diseased crop leaves which is categorized into 38 different classes. The total dataset is divided into 80/20 ratio of training and validation set preserving the directory structure. A new directory containing 33 test images is created later for prediction purpose.
A total of 18 sequences were collected of various lengths. Since the Velodyne LiDAR, RealSense camera and Vicon motion tracker system run in different frequencies, we synchronized these systems so that the image and LiDAR in each timestamp has the same 6-DoF pose. For the static scenario, there are no moving objects in the scene. For other scenarios, there are people randomly walking in the scene. Sequences 01-10 come from the static environment, sequences 11-15 are the one-person moving scenario, and sequences 16-18 are two-persons moving scenario.
Description We propose a new database for information extraction from historical handwritten documents. The corpus includes 5,393 finding aids from six different series, dating from the 18th-20th centuries. Finding aids are handwritten documents that contain metadata describing older archives. They are stored in the National Archives of France and are used by archivists to identify and find archival documents.
The Industrial Biscuits (Cookie) dataset is our internal dataset designed for the anomaly detection task, which captures Tarallini biscuits. It contains 1225 samples in four classes with the following structure:
Noiseless reverberant dataset using the public WSJ0 corpus and simulated room impulse responses using the PyRoomAcoustics library. Used in: - Speech Enhancement and Dereverberation with Diffusion-based Generative Models, Richter et al., arXiv 2022 - StoRM: A Stochastic Regeneration Model for Speech Enhancement and Dereverberation, Lemercier et al., arXiv 2022 - Analysing Discriminative versus Diffusion-based Generative Models for Speech Restoration, Lemercier et al., ICASSP 2023
ContactArt is a dataset for learning hand-object interaction priors for hand and articulated object pose estimation. The dataset is created using visual teleoperation, where the human operator can directly play within a physical simulator to manipulate the articulated objects. All the object models are from Partnet dataset for the convenience of scaling up. ContactArt can provide accurate annotation, rich hand-object interaction, and contact information.
Indigo Mobile is a public dataset of copy detection patterns (CDP) based on DataMatrix modulation.
QDAT data set contains 1500 WAV files along with sound files stored on Excel CSV file format. The sound file contains links to the WAV files attached with other features: Age, Gender, and the correctness of the recitation of the three recitation rules and the final goal shows the correctness of the whole reading.
MVSep is a synthetic dataset for the vocal separation task created by combining random vocal and instrumental samples, publicly available on the internet. The sourced samples were separated into two sets (vocal-only and instrumental-only) and then randomly mixed together. The mixtures may not always sound like a real melody, but they allow for testing audio separation methods. Synth MVSep dataset consists of 100 tracks, each with a duration of exactly one minute and a sample rate of 44.1 kHz.
This dataset for the semantic segmentation of potholes and cracks on the road surface was assembled from 5 other datasets already publicly available, plus a very small addition of segmented images on our part. To speed up the labeling operations, we started working with depth cameras to try to automate, to some extent, this extremely time-consuming phase.
We introduce a novel dataset consisting of images depicting pink eggs that have been identified as Pomacea canaliculata eggs, accompanied by corresponding bounding box annotations. The purpose of this dataset is to aid researchers in the analysis of the spread of Pomacea canaliculata species by utilizing deep learning techniques, as well as supporting other investigative pursuits that require visual data pertaining to the eggs of Pomacea canaliculata. It is worth noting, however, that the identity of the eggs in question is not definitively established, as other species within the same taxonomic family have been observed to lay similar-looking eggs in regions of the Americas. Therefore, a crucial prerequisite to any decision regarding the elimination of these eggs would be to establish with certainty whether they are exclusively attributable to invasive Pomacea canaliculata or if other species are also involved. The dataset is available at https://www.kaggle.com/datasets/deeshenzhen/pi
CWD30 comprises over 219,770 high-resolution images of 20 weed species and 10 crop species, encompassing various growth stages, multiple viewing angles, and environmental conditions. The images were collected from diverse agricultural fields across different geographic locations and seasons, ensuring a representative dataset.
The collected dataset consists of multivariate time series (MTS) data belonging to several ATMs banking along with the annotations that the operators did when they performed a maintenance task on any of the machines.