19,997 machine learning datasets
19,997 dataset results
This data set includes hourly air pollutants data from 12 nationally-controlled air-quality monitoring sites. The air-quality data are from the Beijing Municipal Environmental Monitoring Center. The meteorological data in each air-quality site are matched with the nearest weather station from the China Meteorological Administration. The time period is from March 1st, 2013 to February 28th, 2017. Missing data are denoted as NA.
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
WiderPerson contains a total of 13,382 images with 399,786 annotations, i.e., 29.87 annotations per image, which means this dataset contains dense pedestrians with various kinds of occlusions. Hence, pedestrians in the proposed dataset are extremely challenging due to large variations in the scenario and occlusion, which is suitable to evaluate pedestrian detectors in the wild.
Amazon Photo
VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac). Image Source: http://www.voxforge.org/home
Mario AI was a benchmark environment for reinforcement learning. The gameplay in Mario AI, as in the original Nintendo’s version, consists in moving the controlled character, namely Mario, through two-dimensional levels, which are viewed sideways. Mario can walk and run to the right and left, jump, and (depending on which state he is in) shoot fireballs. Gravity acts on Mario, making it necessary to jump over cliffs to get past them. Mario can be in one of three states: Small, Big (can kill enemies by jumping onto them), and Fire (can shoot fireballs).
Darpa is a dataset consisting of communications between source IPs and destination IPs. This dataset contains different attacks between IPs.
TUM monoVO is a dataset for evaluating the tracking accuracy of monocular Visual Odometry (VO) and SLAM methods. It contains 50 real-world sequences comprising over 100 minutes of video, recorded across different environments – ranging from narrow indoor corridors to wide outdoor scenes. All sequences contain mostly exploring camera motion, starting and ending at the same position: this allows to evaluate tracking accuracy via the accumulated drift from start to end, without requiring ground-truth for the full sequence. In contrast to existing datasets, all sequences are photometrically calibrated: the dataset creators provide the exposure times for each frame as reported by the sensor, the camera response function and the lens attenuation factors (vignetting).
DCASE 2013 is a dataset for sound event detection. It consists of audio-only recordings where individual sound events are prominent in an acoustic scene.
The ISIC 2018 dataset was published by the International Skin Imaging Collaboration (ISIC) as a large-scale dataset of dermoscopy images. The Task 3 dataset is the challenge on lesion classification. It includes 2594 images. The task is to classify the dermoscopic images into one of the following categories: melanoma, melanocytic nevus, basal cell carcinoma, actinic keratosis / Bowen’s disease, benign keratosis, dermatofibroma, and vascular lesion.
BCN_20000 is a dataset composed of 19,424 dermoscopic images of skin lesions captured from 2010 to 2016 in the facilities of the Hospital Clínic in Barcelona. The dataset can be used for lesion recognition tasks such as lesion segmentation, lesion detection and lesion classification.
Open PI is the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. The dataset comprises 29,928 state changes over 4,050 sentences from 810 procedural real-world paragraphs from WikiHow.com. The state tracking task assumes new formulation in which just the text is provided, from which a set of state changes (entity, attribute, before, after) is generated for each step, where the entity, attribute, and values must all be predicted from an open vocabulary.
VLEP contains 28,726 future event prediction examples (along with their rationales) from 10,234 diverse TV Show and YouTube Lifestyle Vlog video clips. Each example (see Figure 1) consists of a Premise Event (a short video clip with dialogue), a Premise Summary (a text summary of the premise event), and two potential natural language Future Events (along with Rationales) written by people. These clips are on average 6.1 seconds long and are harvested from diverse event-rich sources, i.e., TV show and YouTube Lifestyle Vlog videos.
The Video2GIF dataset contains over 100,000 pairs of GIFs and their source videos. The GIFs were collected from two popular GIF websites (makeagif.com, gifsoup.com) and the corresponding source videos were collected from YouTube in Summer 2015. IDs and URLs of the GIFs and the videos are provided, along with temporal alignment of GIF segments to their source videos. The dataset shall be used to evaluate GIF creation and video highlight techniques.
TaxiNLI is a dataset collected based on the principles and categorizations of the aforementioned taxonomy. A subset of examples are curated from MultiNLI (Williams et al., 2018) by sampling uniformly based on the entailment label and the domain. The dataset is annotated with finegrained category labels.
Dataset Description The Greek Sign Language (GSL) is a large-scale RGB+D dataset, suitable for Sign Language Recognition (SLR) and Sign Language Translation (SLT). The video captures are conducted using an Intel RealSense D435 RGB+D camera at a rate of 30 fps. Both the RGB and the depth streams are acquired in the same spatial resolution of 848×480 pixels. To increase variability in the videos, the camera position and orientation is slightly altered within subsequent recordings. Seven different signers are employed to perform 5 individual and commonly met scenarios in different public services. The average length of each scenario is twenty sentences.
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.
5,519 query-based summaries, each associated with an average of 6 input documents selected from an index of 355M documents from Common Crawl.
A dataset consisting of stereo thermal, stereo color, and cross-modality image pairs with high accuracy ground truth (< 2mm) generated from a LiDAR. The authors scanned 100 cluttered indoor and 80 outdoor scenes featuring challenging environments and conditions. CATS contains approximately 1400 images of pedestrians, vehicles, electronics, and other thermally interesting objects in different environmental conditions, including nighttime, daytime, and foggy scenes.
ClariQ is an extension of the Qulac dataset with additional new topics, questions, and answers in the training set. The test set is completely unseen and newly collected. Like Qulac, ClariQ consists of single-turn conversations (initial_request, followed by clarifying question and answer). In addition, it comes with synthetic multi-turn conversations (up to three turns). ClariQ features approximately 18K single-turn conversations, as well as 1.8 million multi-turn conversations.