19,997 machine learning datasets
19,997 dataset results
Ethics (per ethics) dataset is created to test the knowledge of the basic concepts of morality. The task is to predict human ethical judgments about diverse text situations in a multi-label classification setting. The main objective of the task is to evaluate the positive or negative implementation of five concepts in normative with ‘yes’ and ‘no’ ratings. The included concepts are as follows: virtue, law, moral, justice, and utilitarianism.
AdvNet is a dataset of traffic signs images. Specifically, it includes adversarial traffic sign images (i.e., pictures of traffic signs with stickers on their surface) that can fool state-of-the-art neural network-based perception systems and clean traffic sign images without any stickers on them.
This work presents CLOTH3D, the first big scale synthetic dataset of 3D clothed human sequences. CLOTH3D contains a large variability on garment type, topology, shape, size, tightness and fabric. Clothes are simulated on top of thousands of different pose sequences and body shapes, generating realistic cloth dynamics. We provide the dataset with a generative model for cloth generation. We propose a Conditional Variational Auto-Encoder (CVAE) based on graph convolutions (GCVAE) to learn garment latent spaces. This allows for realistic generation of 3D garments on top of SMPL model for any pose and shape.
We provide multiple human annotations for each test image in Fashion-MNIST. This can be used as soft labels or probabilistic labels instead of the usual hard (single) labels.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
GIRT-Data is the first and largest dataset of issue report templates (IRTs) in both YAML and Markdown format. This dataset and its corresponding open-source crawler tool are intended to support research in this area and to encourage more developers to use IRTs in their repositories. The stable version of the dataset contains 1_084_300 repositories, and 50_032 of them support IRTs.
This provides a benchmark for cyclist's orientation detection, "CIMAT-Cyclist" with bounding box based labels according to eight different classes depending on the orientation. Which contains 11, 103 images, of which 6,605 images were collected in approximately 450 videos and images taken from sports events and the streets of the state of Zacatecas, Mexico, while 4,498 additional images were obtained from the web in pages such as pixabay, pexels, freephotos, among others. "CIMAT-Cyclist" provide 20,229 instances over 11,103 cyclist's images, where 80% of the images were split for the training set and 20% for the test set.
IBL-NeRF Dataset. Contains multi-view images with its intrinsic components.
Differential fluorescent staining is an effective tool widely adopted for the visualization, segmentation and quantification of cells and cellular substructures as a part of standard microscopic imaging protocols. Incompatibility of staining agents with viable cells represents major and often inevitable limitations to its applicability in live experiments, requiring extraction of samples at different stages of experiment increasing laboratory costs. Accordingly, development of computerized image analysis methodology capable of segmentation and quantification of cells and cellular substructures from plain monochromatic images obtained by light microscopy without help of any physical markup techniques is of considerable interest. The enclosed set contains human colon adenocarcinoma Caco-2 cells microscopic images obtained under various imaging conditions with different viable vs non-viable cells fractions. Each field of view is provided in a three-fold representation, including phase-con
Instrument playing technique (IPT) is a key element of musical presentation.
A dataset consisting of recipient 46 users and, 26180 tweets. The dataset includes the news feed of the users and 13 features that may influence the relevance of the tweets.
We applied our framework, dubbed as ”PreNeRF 360”, to enable the use of the Nutrition5k dataset in NeRF and introduce an updated version of this dataset, known as the N5k360 dataset.
The GATITOS (Google's Additional Translations Into Tail-languages: Often Short) dataset is a high-quality, multi-way parallel dataset of tokens and short phrases, intended for training and improving machine translation models. This dataset consists in 4,000 English segments (4,500 tokens) that have been translated into each of 26 low-resource languages, as well as three higher-resource pivot languages (es, fr, hi). All translations were made directly from English, with the exception of Aymara, which was translated from the Spanish.
Human Action Evaluation (HAE) has rarely been applied to real-world disease monitoring, the EHE dataset aims to gather sample data to validate effective HAE methods that could then be expanded on a larger validation scale. EHE consists of several actions from morning exercises that patients complete daily in the elderly home. The EHE dataset contained 869 action repetitions performed by 25 older people. Six exercises were collected for the EHE dataset via Kinect v2.
HAMMER dataset contains 13 Scenes. Each scene has two setups, with/without objects (with : scene includes several objects with various surface material, without : scene with only backgrounds - naked) and each scene has two camera trajectories. Each trajectories composed with roughly 300 frames, which adds up to 16k frames in total (13 x 2 x 2 x 300). Each trajectory contains corresponding images from each cameras : d435 – stereo, l515 – Lidar (D-ToF), polarization – RGBP (RGB with polarization), tof – (I-ToF). Each camera folder contains its intrinsic file and its own recorded images together with rendered depth GT / instance GT and camera pose. All the cameras are fully synchronized via robotic arm’s data acquisition setup.
WikiTableSet is a large publicly available image-based table recognition dataset in three languages built from Wikipedia. WikiTableSet contains nearly 4 million English table images, 590K Japanese table images, 640k French table images with corresponding HTML representation, and cell bounding boxes. We build a Wikipedia table extractor WTabHTML and use this to extract tables (in HTML code format) from the 2022-03-01 dump of Wikipedia. In this study, we select Wikipedia tables from three representative languages, i.e., English, Japanese, and French; however, the dataset could be extended to around 300 languages with 17M tables using our table extractor. Second, we normalize the HTML tables following the PubTabNet format (separating table headers and table data, removing CSS and style tags). Finally, we use Chrome and Selenium to render table images from table HTML codes. This dataset provides a standard benchmark for studying table recognition algorithms in different languages or even
The archive contains original images from NIH3T3 cells stained with Hoechst 33342 as PNG files. It also contains images (as Photoshop and GIMP files) showing hand-segmentation of the Hoechst images into regions containing single nuclei.
DeepPCB
The paper introduces three benchmarking tasks inspired by animal learning.
EgoTV dataset consists of (task description, video) pairs with positive on negative task verification labels. By combining the six sub-tasks heat, clean, slice, cool, put, pick with different ordering constraints, there are 82 tasks for EgoTV. Tasks are instantiated with 130 target objects (excluding visual variations in shape, texture, and color) and 24 receptacle objects, totaling 1038 task object combinations. These are performed in 30 different kitchen scenes.